For Further Information Contact:
Switzerland Update: Data Fodder for AI: How to Successfully Commercialize Datasets
03/04/2024Artificial intelligence needs to be fed with data so that appealing output can be generated. This provides record owners with new ways to commercialize data – i.e., data is licensed to third parties in exchange for compensation. In the 12th article of our AI series, we show what you should consider when drafting data license agreements.
- Licensing data as a business model?
Feeding AI models with data of the right quality and quantity is central to the success of AI applications. The more accurate and comprehensive this training data is, the better the output. This opens up a new source of income for owners of data sets. For example, the operator of the platform Reddit licenses the user-generated content on it to Google for around USD 60 million per year.
But it’s not just tech giants that can benefit from this relatively new market. These models can also be of interest to SMEs. Let’s say that a heating plumbing company has installed and maintained heat pumps of a certain brand for the past few decades. From his customer file, it is possible to see when he has performed which repair services for which customer. From this information, insights can be drawn about wear that the manufacturer may be interested in. He could e.g. Develop a preventive maintenance tool and make it available to its distributors. For the heating plumbing itself, the implementation of such a tool at its own expense will not be worthwhile. The transmission of this data from the heating plumbing to the manufacturer therefore makes economic sense for both parties. But how is that supposed to work?
- Specifics of Data License Agreements
Swiss law does not recognise a general exclusive right to data; only selectively does it allow a claim for protection. For example, the unauthorized disclosure of manufacturing or trade secrets is prohibited. If the data are personal data, i.e. data relating to an identified or identifiable person, the data subject may prohibit the processing of the data and, in particular, the disclosure of the data under certain circumstances. If the copyright protection requirements are fulfilled (i.e. in the case of intellectual creations with an individual character), there is an exclusive right to them, which, however, only applies to the work itself or parts of it, but not also to each individual date contained therein. If the data or the data set is a market-ready work product, it may not be taken over and used by means of technical reproduction processes without reasonable effort on your part. Property ownership can only exist in the carrier medium, but not in the data stored on it itself.
The use of data by third parties can therefore often not be prevented by legal instruments. Anyone who wants to use data is therefore generally not obliged to obtain permission from anyone to do so. At first glance, the commercialization of data does not seem so easy, as there is no incentive on the part of the buyer to pay for the use of data – especially if the data is publicly available.
If you want a third party to pay for the use of data, you can, for example, do so. by means of de facto control instruments, in particular by protecting its data from unauthorised access and by granting access only for a fee.
If the owner of the data makes it available to a third party, it is advisable to stipulate in the contract what this third party is allowed to do with the data. In particular, it should be ensured that the third party is not allowed to pass on the data without restriction. Otherwise, the first data owner loses control over “his” data and thus also over its commercialization.
Furthermore, in the case of machine-generated data in the EU area, the requirements of the Data Act must still be observed. However, we will discuss these in a separate blog post. In this blog post, we’ll show you what data owners need to look out for if they want to successfully commercialize their data.
- What you should pay attention to before sharing data
The data provided for licensing may contain various information for which contractual or regulatory restrictions must be observed. Particular attention should be paid to the following:
Personal data: If your data record contains personal data, e.g. the name of the invoice recipient, the requirements of data protection law must be observed. Since personal data is usually not relevant for the training of AI models and the use of personal data is accompanied by additional regulatory requirements, it will be advisable in many cases to anonymize the data before transmission. Depending on the data set, the simple removal of the name, a telephone number or the address will not be sufficient for the data to be effectively anonymized – because the data is only anonymized if it cannot be assigned to a person again or cannot be assigned again with proportionate effort. If anonymisation is not possible or if the personal data is relevant for further use, this must be communicated transparently to the data subjects, e.g. in a data protection declaration. Depending on the setup, the consent of the data subjects is necessary – e.g. if the disclosure is not in accordance with the general principles of data processing and no overriding interests of the controller justify the disclosure or if the disclosure of particularly sensitive personal data (e.g. health data). In the scope of application of the GDPR – unlike under Swiss data protection law – a legal basis for data processing is required, which will regularly complicate the disclosure of personal data. We will discuss which requirements must be met for the training of AI models in a separate blog post.
Contractual confidentiality: Confidentiality obligations are often casually recorded in contracts without the parties being aware of the possible consequences. For example, information about the sales volume of a contract or the specific services used may well be covered by a contractual obligation of confidentiality. Anyone who betrays a manufacturing or trade secret that he should keep on the basis of a legal or contractual obligation, but also who exploits such a betrayal for himself or another, can be punished on application (Art. 162 SCC). The disclosure of manufacturing or trade secrets is also prohibited if they have been unlawfully obtained (Art. 6 UWG). Therefore, if data from business operations is to be reused, care must be taken to ensure that it is selected or prepared in such a way that no contractual confidentiality obligations are violated. Ideally, the confidentiality obligation already specifies for which secondary use contract data may be used.
Protection of legal secrecy:There are various legal confidentiality obligations, such as professional secrecy (Art. 321 SCC), e.g. lawyers or doctors, but also the so-called “small professional secrecy”, which applies to any person who has become aware of secret personal data in the exercise of his or her profession and intentionally discloses it to third parties (Art. 62 FADP).
Copyright: If the dataset contains photographs or other copyrighted content, the consent of the copyright holder is generally required for the transfer (see our blog post “AI and Copyright: Responsibility of Providers and Users“). In the case of internally generated content, the employer is usually allowed to dispose of it and pass it on accordingly, because the rights to the work results in his company usually lie with him. However, if the data set contains content from third parties, it must be checked on a case-by-case basis whether the disclosure is permissible. As a rule, this will require the consent of the author.
Antitrust: Particular caution should be exercised if the data is supplied between competitors in the market and this data facilitates the implementation of agreements in connection with prices, quantities or territories.
- You should settle these points in the contract
Contents of the dataset: What kind of data does the dataset contain? It should be clear to the recipient what kind of data he receives and what characteristics it has. This includes, for example: the quality (for example, is there a requirement that every single pair of data within the data set is correct and complete?), but also the actual content of the data set. This sounds banal and self-evident, but in practice a relatively precise description of the data is of great relevance. For example, Pulse data is not necessarily the same as pulse data, but it can be decisive whether this data was collected by professionals using professional and calibrated measuring devices or by means of private fitness trackers. To prevent problems, we recommend specifying what should not be included in the data set, such as personal data or trade secrets.
It should also be noted whether it is a one-time transfer of the data set, or whether regular updates are also subject to the agreement. It is also important to define a cut-off date on which the data set will be extracted and to record how up-to-date the data is at that point in time (e.g. over what period or by which end date the data was collected). Especially in the case of larger data sets, it is advisable to “label” the data sets by means of hash values.
Preparation, format and structure of the data/dataset: The highest quality data will not help the recipient if they cannot read and edit it. It is also not uncommon for a recipient to obtain data records from different providers, which he then wants to connect for his purposes. This is only possible if the data sets, including the data they contain, are compatible with each other or at least can be made compatible. It is therefore advisable to contractually define the format and structure of the data/data set.
Personal data: Can the recipient assume that the data record does not contain any personal data or is the data only pseudonymised, so that it is relatively easy to assign it to a natural person? From the point of view of the owner of the data, it makes sense to oblige the recipient to take measures to prevent re-identification and, in particular, to refrain from such actions himself.
Provision of the data: For example, does the recipient only have temporary remote access with a clearly limited right of use? If so, is it necessary to ensure a certain availability of access and is the recipient allowed to make copies of the data set? Or is the data set delivered to the recipient in one and can he then save it with him?
Use of the data by the recipient: The data provider should consciously think about the purposes for which the data could potentially be used and which he does not want to allow economically or ethically. In order to prevent the owner of the data from losing control over “his” data or over its commercialisation, it should be regulated whether and, if so, under what conditions and to what extent the data set may also be made available by the recipient to other users. Since the use of the data by third parties, once it has been passed on by the recipient, can hardly be prevented by the owner of the data (unless there are additional rights such as copyrights), it may make sense to provide for a contractual penalty in the event of an unauthorized disclosure of the data.
Modalities of consideration: There are many conceivable ways in which access to data can be remunerated. In the case of a one-time delivery of the data, a lump sum payment may make sense; for recurring deliveries/updates, a subscription model is usually more suitable. Of course, profit or revenue sharing in the model trained with the data or the waiver of future usage fees for the AI application are also possible.
Obligations in the handling of the data set: It should be defined which technical and organizational measures both parties have to implement to ensure the integrity, availability and confidentiality of the data set – as is already often done in connection with the processing of personal data.
Warranty and liability: The recipient will usually have an interest in ensuring that the data provided by the supplier is correct. In practice, it will also be relevant for the recipient that the disclosure and agreed use of the data does not infringe any third-party rights. In particular, the above-mentioned rights that may exist over data, such as e.g. copyright law. Indemnification in the event of such an infringement may also be demanded from the recipient. Whether and to what extent the data holder can and should engage in this depends on the respective setting. In the case of very large data sets, such agreements are likely to lead to an excessive liability risk for the owner of the data in many cases. It is therefore important to coordinate the scope of the warranty, the liability and the remuneration.
Contract Termination: What happens when the contract is terminated? Are there any specific deletion obligations? Since there is no universally applicable right of exclusivity for data, the obligation to cease use must be made on a purely contractual basis. It is important to note that it is often difficult to prove that a licensee continues to use a data set inadmissibly even after the contract has ended: It is often not possible for third parties to see which data a company is working with, and it will often be impossible or difficult to determine the origin of the data.
- Alternative licensing models?
The initiation of direct licensing of content, as in our example of heating plumbing, is time-consuming in individual cases and this effort will often not be worthwhile. It is therefore to be expected that certain standards for the licensing of data will be established in the medium term. This standardization can be done in a number of ways – certain models have already been tried and tested in the music industry.
Initially, aggregators can collect content in the market and, if necessary, process it and then license it further. Instead of launching an AI image generator itself, Getty Images could make its image database available to AI model providers for training purposes. In the music industry, music publishers (e.g. Sony Music Publishing) play such a role: they sign songwriters and take over licensing to collecting societies, record labels and media companies.
Furthermore, it would be theoretically possible for collecting societies to take over the exploitation on behalf of the rightholders in specific cases. This model does not allow the rights holders to prevent the use, but they are remunerated for the use. In the music industry, this model is used, for example, when a song is played on the radio. A composer cannot prevent a radio station from playing his songs – but the radio station must pay the collecting society a remuneration, which is due to the composer via a distribution key. However, a direct adaptation of this model to the licensing of data sets is rather unlikely, especially since this model is linked to the copyrights of the owners. Some of the data sets that are not publicly available in companies and are of interest for training AI models will not justify a claim to copyright protection, so often only factual or contractual restrictions can restrict use by third parties. Thus, with a few exceptions aimed at specifically defined situations (see above under “Specifics of Data License Agreements”), there is no “right” that can be asserted against any third party that does not have a license.
So-called model contracts, which are used very frequently, especially in the context of open source software, will be promising. They allow the rights holder to publish the content under a known license and the user to use the content free of charge without having to obtain permission from the rights holder. There are already open source licenses that are specifically tailored to the licensing of AI models, e.g. those of RAIL (Responsible AI Licenses). RAIL has announced that it will publish a model licence specifically tailored to the licensing of data (OpenRAIL-D). It remains to be seen whether standards for fee-based licensing will also be established. In order to exploit the full potential of data licensing, this would definitely be welcome.
By Vischer, Switzerland, a Transatlantic Law International Affiliated Firm.
For further information or for any assistance please contact switzerland@transatlanticlaw.com
Disclaimer: Transatlantic Law International Limited is a UK registered limited liability company providing international business and legal solutions through its own resources and the expertise of over 105 affiliated independent law firms in over 95 countries worldwide. This article is for background information only and provided in the context of the applicable law when published and does not constitute legal advice and cannot be relied on as such for any matter. Legal advice may be provided subject to the retention of Transatlantic Law International Limited’s services and its governing terms and conditions of service. Transatlantic Law International Limited, based at 84 Brook Street, London W1K 5EH, United Kingdom, is registered with Companies House, Reg Nr. 361484, with its registered address at 83 Cambridge Street, London SW1V 4PS, United Kingdom.