生成式人工智能训练数据的合理使用

Fair Use of Generative Artificial Intelligence Training Data

作者：乔雨涵

学号

2021******
学位

硕士
电子邮箱

qia******.cn
答辩日期

2024.05.31
导师

蒋舸
学科名

法律
页码

106
保密级别

公开
培养单位

066 法学院

中文关键词

生成式人工智能;训练数据;合理使用;个人信息;企业数据

英文关键词

Generative artificial intelligence; Training data; Fair use; Personal information; Enterprise data

摘要

文章锚定生成式人工智能的训练数据问题，在功利主义福利最大化的理论基础上，探讨不同类型训练数据的权益与行动自由分配问题，并基于现存问题提出“合理使用”解决方案。文章首先将生成式人工智能训练数据分为三种不同的权益客体，分别为个人信息、作品数据，以及企业数据，并阐释为何选择探讨个人信息而非其他人格权、选择企业数据而非商业秘密。文章还指出了三种权益类型都适用“合理使用”的共通逻辑，旨在体现“合理使用”底层的抽象统一法理如何突破部门法之限制，并通过这一整合式的抽象探讨使制度设计的成本最小化。文章其次通过案例研究，对上述类型训练数据现存的权益主体博弈现状，即侵权风险与豁免困境进行梳理。就个人信息而言，存在着个人信息强势保护与弱势利用的失衡，即非法使用等风险与难以通过法定例外豁免之困境；就作品数据而言，存在着版权成本高昂与预期收益较低的失衡，即具有侵犯著作权风险与现行权利限制方式无法涵盖的困境；就企业数据而言，存在着重视保护数据权益方的失衡，即不正当竞争风险以及竞争法路径权利限制缺位的豁免困境。为解决上述问题，文章着重分析了合理使用的适用原因。首先在功利主义视角下，人工智能获取数据具有正当性，且对中国经济与产业发展大有裨益；其次通过经济分析，发现人工智能获取训练数据存在着市场失灵现状，同时社会期望向人工智能获取数据转移；其三通过对个人信息“形式过度”保护的反思、对技术变革与版权合理使用衡量因素的思考，以及对企业数据使用与数据衍生市场、公共利益之考量，得出在上述情况中应构建训练数据的合理使用制度。在制度设计层面，对于个人信息的合理使用，应完善合法性基础法定例外的“公开信息”情形，并通过“非识别性利用”界定“转换性”，还应承认相对匿名化以保障个人信息主体权益；对于作品数据的合理使用，首先否定了在合理使用情形中增设人工智能利用专条，主张进行个案个判与商业性使用原则，并提出在输入阶段考虑“转换性使用”，在输出阶段考虑用户与技术提供者的“责任二分”。对于企业数据的合理使用，首先分析了企业数据各类规制模式的局限，其次在反不正当竞争法数据专条下，建议考虑“实质性替代”而非“不正当获取”、以及司法裁判应由“权益保护”模式回归至“行为正当”模式。

This paper focuses on the issue of training data for generative artificial intelligence. Grounded in the utilitarian principle of maximizing welfare, the paper investigates the distribution of rights and freedom of action concerning different types of training data. Based on the existing challenges, the paper proposes a "fair use" solution.Specifically, this paper categorizes the training data of generative artificial intelligence into three distinct categories of rights objects: personal information, creative works data, and enterprise data. It further delineates the boundaries for the protection of personal information in relation to other personality rights and corporate data in relation to trade secrets. The paper also highlights the common principles underlying the "fair use" regime applicable to all three types of rights. The aim is to demonstrate how the underlying abstract legal principles of the regime can transcend the limitations of sectoral laws and be applied to concrete real-world issues. This integrative abstract discussion also minimizes the cost of institutional design.Furthermore, through case studies, the paper examines the current state of play between different types of training data and the associated rights holders, specifically focusing on the dilemma between infringement risks and exemptions. Regarding personal information, there is an imbalance between strong protection of personal information and its weak utilization, characterized by risks of unauthorized use and the difficulty of exemption through statutory exceptions. For creative works data, there is a disparity between high copyright costs and low expected returns, involving risks of copyright infringement and limitations of existing rights restriction methods. For corporate data, there is an imbalance in prioritizing the protection of data rights holders, involving risks of unfair competition and a lack of exemption due to the absence of competition law pathways.To address the aforementioned issues, the paper provides an overall analysis of the reasons for fair use. Firstly, from a utilitarian perspective, it analyzes the legitimacy of AI data acquisition through an examination of China‘s industrial development. Secondly, through economic analysis, it concludes that there is a market failure in the current state of AI obtaining training data, and there is a societal expectation for resources to shift towards AI data acquisition. Thirdly, through reflections on the "excessive formality" protection of personal information, considerations of technological changes and fair use of copyrights, and considerations of enterprise data usage and the data derivative market, as well as public interest, it is concluded that a fair use system for training data should be established under the above circumstances.At the institutional design level, for the fair use of personal information, the legal basis for statutory exceptions for "public information" should be perfected. "Transformative" should be defined through "non-identifying use," and relative anonymization should be acknowledged to safeguard the rights and interests of personal information subjects. For the fair use of work data, firstly, the addition of AI utilization clauses in the case of fair use is denied, advocating for case-by-case judgments and the principle of commercial use. It is proposed to consider the "transformative use" principle at the input stage and to consider the "dual responsibility" of users and technology providers at the output stage. For the fair use of enterprise data, firstly, the limitations of various regulatory models for enterprise data are analyzed. Secondly, under the anti-unfair competition law data provisions, it is suggested to consider "substantive replacement" rather than "unfair acquisition," and judicial judgments should shift from a "rights protection" model to a "legitimate behavior" model.

概览页

生成式人工智能训练数据的合理使用

Fair Use of Generative Artificial Intelligence Training Data

摘要

请选择登录入口

添加临时用户

概览页

生成式人工智能训练数据的合理使用

Fair Use of Generative Artificial Intelligence Training Data

摘要

国内学位论文

国外学位论文

请选择登录入口