A trillion pixel problem

By Ed White (VP and principal analyst, Clarivate) and Arun Hill (Senior consultant, Clarivate)

AI is posing fundamental questions to intellectual property (IP) laws. Both what is intellect, and who (or what) can own it. While this affects almost all classes of IP, arguably the biggest challenge – due to its potential reach to everybody – lies in the creative space: copyright.

It is a common enough trope: the law grappling with technical advancement. The 19th century with telegrams and railroads, the later emergence of the car, the question of privacy in the information age. It is perhaps easy to see the question generative AI poses to our legal structures as just another requirement for new regulation and statute, in a long and well-trod tradition.

Yet here, the fundamentals are in play. Our laws assume human ingenuity. Many countries and courts are not yet prepared to recognize nor grant protection and ownership to works created without any human participation.

At the same time, large language models and visual generative systems are creating new works – pictures, text, software. Art and authorship itself are being questioned, in terms that reach back to the role and definition of creativity itself.

The spark itself

Dealing with these questions requires some understanding of exactly how these technologies work. One argument is that generative AI is simply a further (highly developed and advanced) application of modern computational power for repetition, where mass-analysis of patterns in existing creative works – the trillions of words and image pixels accessible to these systems – is leant upon to generate new compilations.

Put another way, a pico-morsel of existing creativity, multiplied and repeatedly analyzed across a billion works, provides source material for a new image, or the possible enjoyable or informative orders of new words. Is it feasible, or even meaningful, to define plagiarism at that level?

It also asks a question of what human creative spark is involved when an author, music composer or artist does the same thing. There are only so many story arcs, character types, or chord sequences. There is a question of whether generative AI techniques are inherently more creative than humans, due to the multiplicative, emulative power they have for inspiration that is impossible for a human to perform.

Further, the usage of generative AI by a creator is likely the most common application of these systems. Their inputs, selection and guidance are like the thought processes of an artist or writer, though with skill barrier totally removed. In short, a productivity and access increase similar to (if much, much less than) film editing software, digital artwork techniques or internet libraries.

Continuing need for human

However, there is a problem. Whose intelligence, whose creativity, is the source of inspiration? For the generative AI systems that have emerged in recent months, that source is almost entirely human. Indeed, the very training of these systems and their performance is being done by humans too – the reason many are opened to free access.

For the near future, AI systems, will continue to require the injection of new creativity to maintain their performance. If we imagine a scenario where generative AI is using AI-produced works as their source in future, due to the ease of creation and the huge increase in volume of such works, entropy implies that performance will quickly degrade.

The implication is that new human creativity – that spark we struggle to perceive – is required. For copyright, this means it will likely retain value and therefore need for a robust system of legal protection.

Back to basics

As the fundamentals of intellect and property are relevant to these questions, it is worthwhile returning to the original reasoning for IP laws themselves. What is their intended public good?

This differs across IP classes, in simple terms: in patent protection, it is the public-informative disclosure of inventions – that innovators learn from the latest development, combined with the incentive to innovate. For trademark, it is a badge of identity enabling consumer protection, and the incentive to register.

For copyright, it is cultural richness, and incentivizing the creation that enhances it. Disclosure is to a degree implicit – movies, books, art and songs have audiences.

Can the desire for new creative works that create cultural enjoyment be met by artificial means? Likely, yes.

Forecasting harms

There is another side to the equation, and these are the harms that potentially arise in an environment where mass-production of works can occur with little effort at speed.

If monetary value can be extracted by owners or users of AI systems, how are the original content creators that provided the tiny morsels of inspiration (or for that matter, a significant proportion of it) to be rewarded?

How will consumers of artificially created works know that the work was artificially created, and will it matter? Here we can see issues of accountability. If we take an example of reference works – a textbook for instance – there is the potential for error or false information to be inappropriately converted into an authoritative source. Today we, in essence, judge authority via credentials – the author as an expert.

These harms wrap up in now common 2023 concerns of consent (both in authorization of works as generative AI sources, and in informed reliance on by users), transparency, accountability, and liability.

The consent and authorized use of copyrighted works in large language models is a highly relevant question for commercial content creators, and the existence of a problem (or at least the perception of a problem) in this space was underscored by the recent announcement from ChatGPT developer OpenAI that they would indemnify users of their models from copyright infringement claims.

The use of proprietary rather than restriction to purely open access content is undoubtedly to the performance benefit of generative AI systems, resulting in better focused, unique, and potentially more accurate results. Leaning upon existing scholarship, discovery, creativity and human intelligence is therefore likely a direct commercial advantage to these systems, as well as to their users. Their content libraries are as important as their algorithms. Equally, it likely harms or undermines the incentive to create, study or discover in the first place.

In addition, the drive towards developing unique or more useful AI inputs and prompts means the imperative to understand the copyright issues is not limited to just the solution providers, but the users also.

Will the market intervene?

Beyond the immediate existing copyright law issues, amendment to those laws should also review the likely changes to the market for creativity and the value placed upon it, when mass-automatic creation is available.

Will the price for automated content have parity with that of human creators? What will be the effect of a significant increase the supply of content – assuming demand remains the same? How will the algorithms need for new human created content be serviced and ensured?

Consumers clearly value the voices implicit in today’s major creative brands – the Marvel movie franchise, the classical scholarship of Mary Beard or the songs of Taylor Swift.

These market forces will be best allowed to develop in an environment in which consumers of creative works have visibility into the sources and systems that created them. Transparency, and the two-way consent it facilitates, is therefore paramount.

Planning for scenarios

Shifting to the practicalities of copyright protection laws, globally today this is largely harmonized by conventions and treaties (i.e. the Berne Convention, the World Trade Organization and its Trade-Related Aspects of Intellectual Property Rights (TRIPS) agreements etc.), so that content owners can expect the same and consistent treatment of their rights around the world.

It is probable that the initial shift and development of copyright applicable to AI content will come from disparate case law decisions that begins a patchwork of protection schemes and locales – clearly undesirable for content creators, AI developers, users and consumers.

But if we aim for a consistent global approach – a likely difficult but updated global copyright convention – what happens in the two scenarios: 1) machine-created content is copyrightable, or 2) it is not?

In the first scenario, the supply of creative content is potentially enormously increased, democratizing access to creative industries and techniques, and with vast productivity increases for all forms of copyrightable content.

In the second scenario – automated content is copyright barred – there are likely issues of creative sources going underground, hidden from consumers and audiences with secret reliance, with harms potentially going undetected or property rights being abused.

Assuming the absence of sentient AI for some time to come, intellectual property – all property for that matter – will continue to be owned and controlled by ultimately human beings, even if at a distance via corporate entities.

The question then boils down to the consent issue. Happiness to consume automated content, and authorization for human-created content to be consumed and analyzed.

Room for both

To maximize the usefulness of generative AI content creation technologies while minimizing the harms, it seems prudent to debate the benefits of new classes of property. The cat is out of the bag; these systems are already producing massive quantities of software, text and artwork. The efforts and interests of government to address AI more generally is interwoven with the question of ownership of the input and the output.

We can see the desirability of, for example, statutory watermarking or required registration of automated creativity (though likely with privacy, and more, concerns) to enable consent. To incentivize that process of transparency, ownership rights for machine-content would seem a fair trade. That may be feasible via new “petty” copyrights – reduced in term, perhaps reduced in the level of enforcement but underpinning the incentives for continued human creativity.

Re-empowering creators

For the input side of the generative process – the proprietary content already in existence – arguably the existing copyright law they enjoy remains fit for purpose. Existing laws do consider the extent of reliance on a pre-existing work and current cases appear to lean towards that view. This disparate case law is needed initially to provide guidance for potential legislation tightening definition of what is fair usage by AI systems, and what controls creators have over their works once they leave the pen.

Treaties such as the Berne Convention and many civil law jurisdictions contain a broader set of ethical and legal author’s rights around treatment and attribution (or, indeed, non-attribution) that generative AI systems may need to consider, or legislators may wish to newly re-embed in wider AI regulation. These older and more obscure laws (that are not universally implemented) around integrity and attribution rights would seem highly applicable to visual and text generative content systems, perhaps with new tests for the level of reliance, or new forms of “analysis rights”.

However, the transparency of generative systems, for example a new legal requirement to disclose their sources, or AI regulation that bakes in transparency, it may find it difficult to consider the imperceivable contribution of all human creativity, or indeed future machine creativity, that inspired their creation. That is an improbably long list of footnotes.

Little is currently defined. Little is regulated. The law assumes creativity is a human activity.

When AI can create, inspired by the pre-existing trillions of pixels and words, and as yet without the legal architecture to balance the rights of creators against their formidable power, we have a problem.

This article was republished with the permission of World IP Review.