OpenInfra Foundation Policy for AI Generated Content

Context

Technology rooted in artificial intelligence (AI) is an actively evolving area with exciting technical possibilities and significant legal uncertainties. The OpenInfra Board of Directors wants to encourage exploration and adoption of new technologies while exercising reasonable caution around potential risks.

Currently we have two general broad buckets of technology use cases we need to be mindful of:

Predictive - Often viewed as “suggestive auto-complete”. A contributor is getting suggestive fragments which they are then making decisions to adopt and modify fragments based upon the work they are executing upon.
Generative - The pattern of providing prose describing what you want, and the AI attempts to compose a result. This may create a pattern where the prose is revised until a suitable result has been reached.

Challenges

Copyright law in this area is presently an evolving topic with a landscape which will take some time to stabilize. As of March 16th, 2023, Computer Generated work is not considered an original work which can be copyrighted in the United States with similar stances being taken in other countries around the world.
Source training data, and thus resulting material, may come from materials which have unclear or incompatible copyrights and/or licenses. In other cases, copyright of any generated code may be explicitly retained by the vendor operating the AI technology, which is incompatible with contribution to projects.
This is an evolving area, and tools will evolve. What may be a Predictive tool today could be a partially Generative tool next week. Contributors need to also be aware, and take action based upon each particular situation, which is the very reason for this document.

Applicability

All contributions of content committed into source revision control systems by projects housed under the OpenInfra Foundation.

Policy and Guidance

It is the policy of the OpenInfra Foundation that:

Contributions must be compatible with the principals of the Four Opens.
Contributions created using Predictive or Generative AI tools are generally permitted if contributors and reviewers follow the checklists below.
Contributions to OpenInfra Foundation projects are distributed under open source software licenses (Apache 2.0 or other OSI approved licenses), so code or content included in a contribution must be compatible with those licenses. The license of a contribution does not need to be exactly the same as the project's license, being compatible means that the contribution's license grants sufficient rights to allow everything the project's license allows (or allows more), and imposes similar restrictions (or fewer restrictions). Many open source licenses are compatible with other open source licenses, and code or content in the public domain is compatible with all open source licenses. Contributors need to verify they have the right to contribute output from AI tools, just like they do for their own original work, work owned by their employer, work copied or modified from another open source project, or work submitted on behalf of a third party.
- Where possible, configure the AI tool to operate in modes that respect open source licensing. This will be different for each tool, but some examples of helpful options are: exclusively using compatible licensed inputs to train the model, using a model released as open source, using a model trained exclusively on compatible licensed code, licensing the code or content output by the tool under a compatible license, or running secondary scans to look for direct copies of training inputs in generated code.
- Any copyrighted materials authored or owned by third parties could be problematic, so make sure they are licensed as open source or public domain, or that you have permission from the copyright holder to release them as open source. Make sure the AI tool doesn’t claim proprietary rights to the code or content generated by the tool.
We generally expect contributions to be made by a human taking an action, so the contributor has a chance to review their contribution for any technical or legal problems before submitting. The exception to the rule is that we do allow submissions from well documented automated processes, such as release tooling or for internationalization updates.
This policy will be re-evaluated and updated as the law, technology, and open source best practices continue to evolve.

Contributor Checklist

As a contributor, you are responsible for the code you submit, whether you use AI tools or write it yourself. Some AI tools offer settings, features, or modes that can help, but these are no substitute for your own review of code quality, correctness, style, security, and licensing.

With all AI tools, contributors should be mindful of their limitations. Carefully review any suggested or generated code or comments to ensure nothing is inherently harmful, malicious, or outright incorrect.
OpenInfra projects will adopt the “Generated-By:” label as proposed by the Apache Software Foundation as part of their Generative Tooling Guidance.
- For contributions created using a Generative AI tool:
  - Generative AI tools should be operated in modes which are compatible with the Open Source Definition as maintained by the OSI.
  - When available, Generative AI features that flag output that resembles publicly available code and provide licensing information should be enabled. Such results should be used to prevent the contribution of incompatibly licensed open source code.
  - When available, Generative AI features that are designed to block output suggestions that match publicly available code should be enabled.
  - Add a “Generated-By:” label to the commit message, and explain in comments or the commit message any prompts or background context the reviewers might need to fully understand the change and how much of the change was generated by the tool.
  - If available, secondary scans should be performed to identify any direct copies of Training Inputs which may have been inserted in the generated output.
- For contributions created using a Predictive AI tool:
  - If your commit includes substantial suggestions from the tool, add the “Generated-By” label to your commit message and explain in comments or the commit message any context the reviewers might need to understand the change and how much of the change came from the tool.
By contributing you are indicating you have the permission and rights to submit the content to a project, so take care in checking that the output of the tool is compatible with the project’s license.

Reviewer Checklist

When reviewing contributions with the “Generated-By” label, verify that the change includes sufficient explanation of the context that the reviewer and future contributors can understand the purpose and origin.
Apply a higher level of scrutiny to contributions created using AI tools, understanding the limitations of the tools. This does not mean automatically rejecting all contributions that use AI tools, it means giving them the same consideration of technical and legal merits and standards as you would give to any other change.
Code style changes may be necessary to meet project standards and community guidelines, please work with the contributor as-needed
If the change set is substantially re-worked by human changes during the code review process, consider whether it makes sense to remove the “Generated-By” label prior to committing.

Example Commit Message

commit 988881adc9fc3655077dc2d4d757d480b5ea0e11
Author: Jane Doe <[email protected]>
Date: March 14 19:34:50 2024 +0900

Add additional unit tests

While performing code review on commit 79c509301a936b89617dab2a632c23ac,
I noticed there were a half dozen additional cases the tests didn’t cover. I used my Copilot
enabled IDE to edit the DDT yaml files, and based upon the test name I typed out, copilot
suggested nearly perfect tests based upon the content in the file.

Generated-By: copilot
Change-ID: I988881adc9fc3655077dc2d4d757d480b5ea0e11

These steps and actions are required for our ability to have context on the contributions, should we need it, as the overall landscape evolves in the years to come.

References

U.S. Copyright Office Registration Guidance Pertaining to Works Generated by Artificial Intelligence - https://www.federalregister.gov/documents/2023/03/16/2023-05321/copyright-registration-guidance-works-containing-material-generated-by-artificial-intelligence
Four Opens - https://openinfra.dev/four-opens/
Apache Software Foundation Generative Tooling Guidance - https://www.apache.org/legal/generative-tooling.html

Change History

Version	Description	Date	Author
0.1	Initial Draft	November 9th, 2023	J. Kreger
0.2	Revision including T. Carrez's feedback	February 16, 2024	J. Kreger & T. Carrez
0.3	Four opens link and content acceptability	February 22, 2024	J. Kreger
0.4	Formatting changes, minor wording changes	February 22, 2024	J. Kreger on behalf of the working group
0.5	Revisions from C. Stevenson	February 28, 2024	C. Stevenson
0.6	Rewrite the policy section into contributor/reviewer checklist oriented statements to provide clear guidance as action items from board working group discussion, and reformat the change history into a table, and further minor text edits suggested by A. Marrich.	March 14, 2024	J. Kreger
0.7	Revisions to make the text both easier to read and more legally precise.	March 26, 2024	A. Randal
0.8	Revisions from C. Stevenson	April 8th, 2024	C. Stevenson
0.8.1	Minor formatting revision and change for clarity, adoption of suggestion from C. Stevenson.	April 17, 2024	J. Kreger
0.9.0	Clarification of Permissively Licensed suggestion from C. Stevenson to Compatible License.	April 23, 2024	A. Randal
0.9.1	Change the word "less" to "fewer" in terms of license restrictions.	May 21, 2024	C. Stevenson
0.9.2	Revision of contributor guidelines based upon feedback from J. Blair	May 31, 2024	A. Randal

Join the OpenInfra Foundation to learn how you can get involved in initiatives around open infrastructure.

JOIN