Technology/ AI Act/ Template for the Public Summary of Training Content for GPAI models

Template for the Public Summary of Training Content for general-purpose AI models

In parallel to the Code of Practice process, the AI Office has also developed a template on the sufficiently detailed summary of training data that general-purpose AI (GPAI) model providers are required to make public, according to Article 53(1)d) of the AI Act. Providers of all GPAI models placed on the Union market must fulfil this obligation, including providers of GPAI models released under free and open-source licenses[1], in so far as the models fall within the scope of the AI Act[2] .

The Template is annexed to the Explanatory Notice[3], published on 24 July 2025, and aims to provide a common minimal baseline for the information to be made publicly available in the Summary of Training Content for GPAI models. The template for the summary of training data is closely linked to the providers’ obligations in relation to transparency and copyright that have been detailed out in the Code of Practice.

Objective of the Summary

Recital 107 AI Act explains that the objective of the Summary is to increase transparency on the content used for the training of general-purpose AI models, including text and data protected by law and to facilitate parties with legitimate interests, including rightsholders, to exercise and enforce their rights under Union law.

The Explanatory Notice elaborates in detail the objectives of the Summary:

First, in relation to intellectual property rights, including copyright and related rights, transparency of the data used for the model training should help rightsholders obtain relevant information on the content used in the training of general-purpose AI models. This information is needed to facilitate the exercise of their fundamental right to intellectual property[4] and the fundamental right to an effective remedy in the enforcement of their rights, as provided for in Union law in the area of intellectual property rights. In the case of copyright and related rights, transparency of the training data will contribute to ensuring that GPAI model providers comply with Union law on copyright and related rights[5]

Second, transparency of the training data in the Summary may facilitate data subjects’ rights and more broadly support the enforcement of the Union data protection rules. In particular, this can be done by summarising all the relevant information together, such as information about the data scraped from the internet or collected by the provider through interactions with the model or other services and products. The information in the Summary is not meant to replace, nor affect the respective information that the providers of GPAI models should make available to data subjects under Union data protection law. In the context of the Summary, the interests of consumers and the protection of their consumer rights under Union law may also be relevant.

Third, transparency of the general characteristics of the content used for training may also assist providers integrating these models into downstream applications to assess the diversity of the data. This, in turn, will allow them to implement, where appropriate, mitigating measures to ensure that the fundamental rights to non-discrimination[6] and language and cultural diversity[7] are respected.

Fourth, greater transparency of the training data may also facilitate the fundamental right to receive and impart information[8] and allow researchers to exercise their freedom of science[9] to conduct scientific research. It can allow academic institutions and organisations to critically evaluate the implications and limitations of a particular GPAI model and the potential risks and harms associated with the data used.

Finally, transparency of the training data may also contribute to more transparent and competitive markets. For example, information about whether publicly available GPAI models have been used to train other models, in particular through model distillation, or whether a model has been trained on user data collected from providers’ own products and services, may help users and companies better understand how their data and models have been used and avoid potential lock-in effects.

Comprehensive scope of the training data and sufficient details

Information about the GPAI model provided in the Summary should cover data used in all stages of the model training, from pre-training to post-training, including model alignment and finetuning. This covers all sources and types of data, regardless of whether the data are protected or not, including by an intellectual property right.

The Template consists of three main sections:

General information: this section requires information allowing identification of the provider and of the model, and information on modalities, the size of each modality within broad ranges, as well as general characteristics of the training data.
List of data sources: this section requires disclosure of the main datasets that were used to train the model, such as large private or public databases, and a comprehensive narrative description of the data scraped online by or on behalf of the provider (including a summary of the most relevant domain names scraped) and a narrative description of all other data sources used (e.g. user data or synthetic data) to ensure completeness of the summary regarding the content used for the model training.
Relevant data processing aspects: this section of the Template requires disclosure of certain data processing aspects that are relevant for the exercise of the rights of parties with legitimate interests under Union law. This is especially important for compliance with Union law on copyright and related rights and for the removal of illegal content to mitigate the risk that such illegal content may be reproduced and disseminated at scale by the general-purpose AI model.

Balance with trade secrets and confidential business information

As explained in Recital 107 AI Act, the Template should seek to strike a balance between serving the interests of parties with legitimate interests and promoting increased transparency of the training content in a meaningful way, while respecting the rights of all parties concerned, in particular taking due account of the need to protect trade secrets and confidential business information.

This careful balancing should be implemented in relation to the information that the Template requires to be disclosed in order for providers to fulfil their obligation under Article 53(1)(d) AI Act and provide a ‘sufficiently detailed’ public summary of the training content. The provision of information regarding more specific details about the content used for the training of the general-purpose AI models is required in the Template only where it is necessary to enable the exercise of rights protected under Union law in a meaningful manner.

To protect providers’ trade secrets, different levels of detail are required in the Template depending on the source of data considered. The Explanatory Notice specifies the scope of that data for specific sources of data: licensed data, private datasets, data scraped from online sources…

Simple, uniform and effective reporting

The information requested by the Template is to be provided in a narrative, simple and effective form. The Template aims to ensure the reported information is useful and understandable to the public and to the parties concerned, while avoiding unnecessary burden on providers of GPAI models, including SMEs.

Each Section of the Template includes clear and short instructions to allow providers to report the required information in an easy and uniform manner. The Commission has provided the Template as an online form and published it on its website.[10]

Providers should ensure that the information included in the Summary is reported in good faith and in an accurate and comprehensive manner. Flexibility is provided under specific sections, as indicated in the Template, to disclose only information that is relevant, necessary for the purpose of the Summary, and practicable to obtain (e.g. regarding the categorisation of some of the content or the characteristics of the training data, or the period of data collection).

The AI Office may verify whether the Template has been filled in correctly in order to assess if the provider has complied with Article 53(1)(d) AI Act. In this context, the AI Office has all enforcement powers under the AI Act and and may request corrective measures. Non-compliance may be sanctioned with fines of up to 3% of the provider’s annual total worldwide turnover in the preceding financial year or EUR 15 000 000, whichever is higher. The lawful collection and processing of the data remains the responsibility of the provider under other applicable Union law (e.g. copyright and data protection).

[1] The exception for free and open-source general-purpose AI model under Article 53(2) AI Act does not apply to the obligation to make publicly available the Summary.
[2] Article 2 AI Act and Guidelines on the scope of the obligations for providers of general-purpose AI models established by the AI Act.
[3] https://digital-strategy.ec.europa.eu/en/library/explanatory-notice-and-template-public-summary-training-content-general-purpose-ai-models
[4] Article 17(2) of the EU Charter of Fundamental Rights of the European Union, OJ C 326, 26.10.2012, p. 391–407.
[5] Directive (EU) 2019/790 of the European Parliament and of the Council of 17 April 2019 on copyright and related rights in the Digital Single Market and amending Directives 96/9/EC and 2001/29/EC (Text with EEA relevance.), PE/51/2019/REV/1, OJ L 130, 17.5.2019, p. 92–125.
[6] Article 21 of the EU Charter of Fundamental Rights.
[7] Article 22 of the EU Charter of Fundamental Rights.
[8] Article 11(1) of the EU Charter of Fundamental Rights.
[9] Article 13 of the EU Charter of Fundamental Rights.
[10] https://digital-strategy.ec.europa.eu/en/library/explanatory-notice-and-template-public-summary-training-content-general-purpose-ai-models.