Скопировано

Open Source AI Standard Defined

05.09.2024 11:28:00
Дата публикации
Amid much debate about what constitutes open source in artificial intelligence, some long-awaited clarity has arrived.

The Open Source Initiative (OSI), an organization dedicated to defining open source standards, has released its first definition for AI models, in what could be a significant step in regulating and advancing the field.

The standard was developed by a collaboration of 70 experts, including researchers, lawyers, policymakers, and representatives from major companies like Meta, Google, and Amazon.

The new standard states that an open-source AI system should be available for anyone to use without permission, and that researchers should be able to examine its components and how it works.

The definition also emphasizes the importance of being able to modify a system for different purposes and to share it freely. This includes requirements for transparency regarding the data used to train the model, the source code, and the “weights” (the numerical values ​​that are updated during the training process and play a key role in how the model processes inputs to produce output).

Before the standard, there was some debate about what exactly constitutes open source AI. For example, while Meta and Google’s models were open source, their licenses and lack of availability of training data raised questions about whether they were truly open source.

Some companies use the term “open source” for marketing purposes, which can be misleading to users.

Avijit Ghosh, a researcher at Hugging Face, noted that this can create a false sense of trust in such models, even if researchers cannot verify their openness.

Aya Bdeir, a senior advisor at Mozilla, was also involved in the process of developing the standard. She noted that issues related to transparency of training data were the most contentious.

Transparency about data sources is important, as the lack of it has already led to numerous lawsuits against companies like OpenAI.

Ultimately, the new standard requires that open-source models provide information about the training data to the extent that a trained person could recreate a similar system using similar data.

This trade-off between full disclosure and copyright compliance helps establish a new level of openness.

OSI also plans to create a control mechanism that will flag models that do not meet this definition but are described as open. A list of models that meet the new standard will be published in the future.


(The text is translated automatically)