RFC: Clarifying our stance regarding AI-generated code in our CLA

I'm considering modifying the CLA like so:

Rationale for the CLA update (AI-assisted contributions)

Copyright law in major jurisdictions requires human authorship. Authorities such as the U.S. Copyright Office have made clear that purely machine-generated output is not copyrightable, while AI-assisted works are protected only to the extent that a human exercised creative judgment (e.g. selection, modification, arrangement):
https://www.copyright.gov/ai/

Given this, the CLA was updated to:

  • explicitly allow AI-assisted development, reflecting modern practice; and
  • exclude fully automated submissions, which would not provide reliable copyright ownership.

The clause deliberately avoids requiring contributors to make false or unverifiable warranties (for example about AI training data or model provenance), which contributors cannot realistically know and which do not meaningfully reduce legal risk. This approach aligns with guidance from open-source legal practitioners, such as Red Hat’s analysis of AI and open source:
https://www.redhat.com/en/blog/ai-assisted-development-and-open-source-navigating-legal-issues

Instead, the CLA anchors responsibility where the law recognises it: in the human contributor’s actions and rights. Contributors must confirm that they exercised meaningful creative judgment and that they have the legal right to submit the work, preserving a clean chain of title suitable for dual or proprietary licensing.

This approach is consistent with how major open-source projects are adapting their contribution policies, for example Fedora’s policy on AI-assisted contributions:
https://discussion.fedoraproject.org/t/council-policy-proposal-policy-on-ai-assisted-contributions/165092


Since it's an important topic that will affect all contributors I'll leave the PR open for some time. Feedback is welcome!

4 Likes

How can AI assisted be distinguished from fully automated? To be fully automated would that mean the agent has to complete the entire workflow of submitting a PR without any user interaction, in addition to writing the code change?

There were a couple of recent PRs which I submitted, where the actually code change was written entirely by an AI agent. They were literally 1-5 line code changes so there was nothing I needed to add or remove from the change, but I committed the change myself and created the PR and wrote the description myself.

Would that be considered AI assisted or fully automated?

That's a good question - fully automated would be for example a project where all code is generated without any user intervention at all and no review process.

In those PRs you mentioned, you instructed the AI to make this change, you reviewed it, maybe you asked it to refine the change, etc. Then you decided that it was good enough and submitted it. Then I reviewed it again, and potentially we discussed changes that were applied and so on.

All that is human authorship. As long as a human reviewed, understood, and took responsibility for what was submitted it counts as AI-assisted, not fully automated.

I think what would count as fully automated would be something like a bot that, for example, applies the linter to a PR and automatically commits. The output would not be copyrightable but it doesn't really matter because it's trivial changes.

I think Fedora or Redhat, I forgot where I read this, also ask contributors to disclose if a large part of a PR was created using AI, and maybe we could ask for this too, if only so that we know what to expect when reviewing. Sometimes unexperienced developers create pure AI PRs, which are very low quality, and in that case we could just close it and ask them to do it again, instead of spending a lot of time discussing changes (that they may not even understand themselves).

2 Likes

It seems to me, Laurent, that your second paragraph is the important one. We want to know that some person “…decided that it was good enough and submitted it,” as has been the case historically (prior to AI-assisted development). The nature of automation used in the development process isn’t really the issue. However, what we’re talking about here is that AI-assisted development tools are increasingly used by people who lack experience in coding submissions and, therefore, are unfamiliar with what “good enough” entails. Perhaps being more explicit about that would help.

One aspect of “good enough” are the copyright issues you’ve mentioned, but other aspects include correctness, reliability, and safety. Nobody can promise that there are no problems, but at least they can say they’ve reviewed the code and believe it meets the criteria, and they are willing to take (at least some) action when something doesn’t meet the criteria.

Thanks for the feedback. I don't think we need to define what would is "good enough" in the CLA itself? Maybe that would even weaken it? Because it means someone could sign it but really their PR didn't quite meet the expected standards as defined by the CLA, and what is acceptable or not can also be suggestive.

In fact, in a way, it's ok to make a terrible PR, as long it has some degree of human authorship - at the legal level that's all we need.

However it might indeed be something that we should clarify in an informal document where we state our expectations in terms of quality, reliability, etc. for AI-assisted contributions. That's also where we could mention that if AI was used to develop a large part of the PR that should be mentioned.

I agree. The CLA should simply ensure that a human person has agreed to the CLA. A companion document is a good way to go.