Copyright: Text and Data Mining – New Rules?

When implemented, the ‘Copyright in the Digital Single Market Directive’ will bring in new rules around text and data mining. Deirdre Moynihan, Partner at Kemp IT Law, shows us what (and what not) to expect.

The new EU Directive on Copyright in the Digital Single Market[i] (the ‘DSM Directive’) has been fraught with controversy and discussion. Many have argued it will lead to the death of the Internet as we know it because of its provisions on upload filters and the link tax. Those points are outside the scope of this blog which focuses on a new and potentially valuable right in the data analytics field but you can find more information on them on our related blogs and vlogs.

Article 2(2) of the DSM Directive defines text and data mining (‘TDM’) as:

“any automated analytical technique aimed at analysing text and data in digital form in order to generate information which includes but is not limited to patterns, trends and correlations”.

The ability to legally perform TDM is becoming increasingly important as new methods and technologies are created to perform sophisticated data analytics and provide new services based on such analytics. Technically, most TDM software/activities require a copy of the content to be TDM-ed. Currently, however, in the UK and EU, there is no blanket exception from copyright or database right protection that allows TDM: copying third party text or data for analytics purposes requires a licence from the rightsholder or reliance on the limited and narrowly construed fair dealings defences. (This is in contrast to the US position where TDM is generally regarded as “fair use” if there is a licence to the underlying work.[ii])

The DSM Directive aims to address the restrictions on copying for TDM by introducing two new exceptions.

Articles 3 and 4 of the DSM Directive permit reproduction of copyrighted works and extraction of information from databases where the user performing TDM has “lawful access” to the protected work.

Lawful access is described as “access to content based on an open access policy or through contractual arrangements between rightsholders and research organisations or cultural heritage institutions, such as subscriptions, or through other lawful means”[iii].

Article 3 expressly permits “reproductions and extractions made by research organisations and cultural heritage institutions in order to carry out, for the purposes of scientific research, text and data mining of works or other subject matter to which they have lawful access”.

“Research organisations and cultural heritage institutions” are defined as: universities, research institutions or any other entities, the primary goal of which is to conduct scientific research or to carry out educational activities involving also the conduct of scientific research on a not-for-profit basis or by reinvesting all the profits in its scientific research; or pursuant to a public interest mission recognised by a Member State.

Article 4 permits reproductions of, and extractions from, “lawfully accessible works” for TDM for any purpose provided that this activity has not been “expressly reserved by rightsholders in an appropriate manner”.

Little guidance is given in the DSM Directive on what amounts to an “appropriate” “express” reservation of TDM rights, aside from the following:

  • where works are made publicly available online, the DSM Directive stipulates that it will only be “considered appropriate to reserve those rights by the use of machine-readable means, including metadata and terms and conditions of a website or a service”; and
  • in other arrangements, the reservation can be included as a contractual term or a unilateral declaration by the rightsholders.

In light of the above, and the fact that the DSM Directive will need to be implemented in each member state for it to be effective, it’s not clear at this stage whether the generally accepted shorthand approach of stating “all rights reserved” or “all rights not expressly granted are reserved by the copyright/database right holder” will be sufficient to prohibit TDM or whether more explicit wording specifically restricting TDM is required to amount to an “express” reservation in “an appropriate manner”.

To address this uncertainty, we expect that, as a matter of best practice and to avoid any issues of interpretation, where rightsholders wish to restrict TDM, they will introduce wording: (a) expressly prohibiting TDM, (b) incorporating an acknowledgement from the user that the prohibition on TDM is made/given in an “appropriate manner”, and (c) that permits the rightsholders to check that the prohibition has been complied with. It’s therefore key for users to check the terms of licences or notices accompanying protected works to establish if TDM is permitted.

While superficially the TDM exceptions in the DSM Directive appear to grant new exceptions to copyright and database right infringement, in our view, it’s likely to have little impact on users’ rights on a day-to-day basis, simply because rightsholders will be able to circumvent the exception by specifically and expressly prohibiting it.

For the position in the UK post-Brexit transition, please see section J of our ‘Algo IP’ white paper.

This blog was first published as part of the white paper which you can read in full here: Algo IP: Rights in Code – 2020 Update



[i] Directive (EU) 2019/790 of the European Parliament and of the Council Of 17 April 2019 on copyright and related rights in the Digital Single Market and amending Directive 96/9 and 2001/29 –

[ii] See, e.g., Authors Guild v. HathiTrust, 755 F.3d 87 (2d Cir. 2014), White v. West (S.D.N.Y. 2014), Fox v. TVEyes (S.D.N.Y. 2014), Authors Guild v. Google, 770 F.Supp.2d 666 (S.D.N.Y. 2011), Kelly v. Arriba-Soft, 336 F.3d 811 (9thCir. 2003), A.V. v. iParadigms, LLC (4th Cir. 2009), Perfect 10 v. Amazon, 508 F.3d 1146 (9th Cir. 2007), and Field v. Google, 412 F.Supp.2d 1106 (D. Nv. 2006).

[iii] Per recital 14 of the DSM Directive.


More Posts

Send Us A Message