Why the training of Large Language Models falls within the scope of the Text and Data Mining exceptions
The question of whether AI training falls within the scope of Text and Data Mining (TDM) exceptions is a topic of intense debate. For some time, legal scholars and stakeholders have argued both for and against the applicability of these exceptions to the training of large language models (LLMs). The introduction of the AI Act seemed to have quieted much of this discussion, as the Act explicitly acknowledges the relevance of TDM exceptions for AI training. However, a recent Resolution of the European Parliament (2025/2058(INI)) has reignited the debate by pointing out the shortcomings of the existing framework.
Statutory exceptions for Text and Data Mining were harmonised by the 2019 EU Directive on Copyright in the Digital Single Market (2019/790) (Articles 3 and 4). The Directive defines TDM very broadly as any automated analytical technique aimed at analysing text and data in digital form in order to generate information which includes but is not limited to patterns, trends and correlations.
An LLM is a large probabilistic model of natural language containing information about patterns, trends and correlations between words and expressions in natural language. As such, training an LLM, whether it is static or dynamic (generative), corresponds perfectly to the definition of TDM in European law. An LLM is specifically not a ‘repository’ of training data, from which the training data could be retrieved in an unmodified form. Although some LLMs have been reported to “regurgitate” portions of the data used in their training, this is a rare occurrence resulting from coincidence or deliberate hostile prompting.
The DSM Directive was proposed in 2016 and adopted in 2019, when the question of LLM training attracted little attention from the general public; however, the European legislator was fully aware of the potential application of the TDM exceptions, as attested by a 2018 Briefing (PE 604.942) requested by the JURI committee. The definition of TDM in the Directive was deliberately broad in order to be future-proof. Ultimately, recent legislation clearly demonstrates the legislator’s intention to include AI training within the scope of TDM exceptions:
Article 53(1)(c) of the AI Act (Regulation (EU) 2024/1689) explicitly requires providers of general-purpose AI models to establish policies for identifying and complying with reservations of rights expressed by rightholders. This provision refers directly to Article 4 of the DSM Directive, which allows rightholders to opt out of TDM by expressing their reservations in a machine-readable format. By requiring compliance with this opt-out mechanism, the AI Act recognizes TDM exceptions as a potential legal foundation for AI training.
Therefore, LLM training enters within the scope of the TDM exceptions and as such it does not require permission from the rightholder, as long as all the conditions of these exceptions are observed. In particular, Article 3 of the DSM Directive is an appropriate basis for LLM training carried out for research purposes. German courts have recently presented a rather permissive interpretation of this exception in the Kneschke v. LAION case (5 U 104/24). Moreover, the above-mentioned Resolution of the European Parliament, criticising the use of TDM exceptions for AI training, calls on the Commission “to ensure that activities conducted for scientific research (…) are not restricted”.
DISCLAIMER: Nothing in this statement is intended to constitute or should be interpreted as legal advice. Keep in mind that both TDM exceptions come with specific requirements.