HONG KONG — To paraphrase the late John F. Kennedy, we chose to define open source AI not because it would be easy, but because it would be hard: because that goal would help us organize and measure the best of our energies and skills.
Stefano Maffouli, executive director of the Open Source Initiative (OSI), said artificial intelligence (AI) is not compatible with software and data that are combined with existing open source licenses. “Therefore, we need to create a new definition of open source AI,” Maffouli said.
And how open source can put AI on the right path
Firefox’s parent organization, the Mozilla Foundation, agrees.
A Mozilla representative explained that big tech companies “have not always fully adhered to open source principles when it comes to AI models,” and said the new definition “will be helpful to lawmakers working to develop rules and regulations to protect consumers from AI risks.”
OSI has been working hard to create a comprehensive definition of open source AI, similar to the Open Source Definition for Software. This important effort responds to the growing need to clearly define what constitutes an open source AI system, at a time when many companies, such as Meta’s Llama 3,1, claim that AI models are open source when in fact they are not open at all.
The latest OSI Open Source AI Definition draft, 0.0.9, has several important changes. These are:
- Clarification of definitions: The definition clearly identifies the model and weights/parameters as part of an AI “system” and emphasizes that all components must adhere to open source standards. This clarity ensures that the entire AI system, not just parts of it, adheres to open source principles.
- The role of the training data: Training data is useful but not required to change AI systems. This decision reflects the complexities of data sharing, including legal and privacy concerns. The draft bill categorizes training data as open, public, or non-public data that cannot be shared, with specific guidelines for each to increase transparency and understanding of bias in AI systems.
- Checklist separation: The License Evaluation Checklist has been separated from the main definition document to align with the Model Openness Framework (MOF). This separation allows for a focused discussion of open source AI identification while maintaining the general principles of the definition.
As Jim Zemlin, executive director of the Linux Foundation, explained at Open Source Summit China, the MOF is “a way to help us evaluate whether a model is open. It allows us to rate the model.”
There are three levels of openness within the MOF, Zemlin added: “Level 1, which is the highest level, is the definition of open science, where the data, all the components used, and all the instructions should actually create your own models in exactly the same way. Level 2 is a subset of that, where not everything is actually open, but it’s mostly open. And then level 3, there are areas where data may not be available, but data that describes the dataset is available. The models are open, but it’s understood that not all the data is available.”
Plus, Linus Torvalds talks about AI, adopting Rust, and why the Linux kernel is “the only thing that matters”
These three levels, concepts that also appear in the training data, will be hard for some open source purists to swallow. As the debate over which AI and machine learning (ML) systems are truly open and which are not continues, debates will likely emerge around both models and training data.
The development of the Open Source AI definition has been undertaken in collaboration with a wide range of stakeholders from around the world, including Code for America, Wikimedia Foundation, Creative Commons, Linux Foundation, Microsoft, Google, Amazon, Meta, Hugging Face, the Apache Software Foundation, the United Nations International Telecommunication Union, and many others.
OSI has held numerous town hall meetings and workshops to gather input to ensure the definition is inclusive and reflects a variety of perspectives, a process that is ongoing.
Plus: Sonos could fail, and millions of devices could follow suit — and why open source audio is our only hope
The definition will continue to be refined and honed through global roadshows and by gathering feedback and support from various communities.
OSI’s Maffulli knows that not everyone will be happy with this draft definition. In fact, before this version appeared, Tom Callaway, principal open source technical strategist at AWS, posted on LinkedIn: “It is my strong belief (and the belief of many others working in open source) that the current open source AI definition does not accurately ensure that AI systems retain the unlimited rights of users to run, copy, distribute, study, modify, and improve them.”
Now that the draft has been made public, others will be able to weigh in; OSI hopes to present a stable version of the definition at its All Things Open conference in October 2024. If all goes well, the result will be a definition that most (if not all) people can agree on in terms of fostering transparency, collaboration, and innovation in open source AI systems.