
Summary Bullets:
• The New York Times lawsuit regarding copyright infringement highlights the challenges society faces in implementing AI technology in a responsible and ethical manner.
• GlobalData has identified six broad key categories of issues related to responsible AI: explainability; bias; ethics; hallucinations, toxicity and poisoning; data privacy and data leakage; and copyright infringement.
During the week between Christmas and New Year’s, the New York Times sued Microsoft and OpenAI for copyright infringement, making headlines during what is normally a very quiet time of year. The news organization claims that the two tech companies illegally used its content to train ChatGPT and other services they offer to consumers and enterprises. The move represents a change in strategy for the New York Times. Since last April, the newspaper had been negotiating with OpenAI and Microsoft to receive compensation for the use of its work to train large language models (LLMs; the Associated Press has a licensing deal in place), but no agreement has been reached so far. Likely, this latest move by the New York Times will reinvigorate those conversations.
As usual, the regulatory environment has not kept pace with technological innovation. This time, the murky waters relate to copyright protections. Generative AI (GenAI) has raised several issues. Content producers don’t want their material used to train LLMs and assert that it is protected from such use by copyrights. On the other hand, many working in the technology industry believe that using such content to train LLMs is legally acceptable under the fair use doctrine. In the absence of a clear path forward, several organizations are deploying code that blocks LLMs from scraping material from their sites.
The debate around copyrights is not only around the inputs used for LLMs; it also extends to the outputs. Can the work produced by GenAI be protected by copyrights? The answer is generally no, it cannot, unless there is substantial human input. In a much-discussed decision in February 2023, the US Copyright Office granted copyright protection to the text and work (but not the AI-generated images) of a comic book that was created with the help of AI since it included substantive human input.
More importantly, the New York Times lawsuit highlights the challenges our society faces in implementing AI in a responsible manner. Responsible AI refers to the ideal that AI projects, whether based on predictive AI or generative AI, are deployed in a manner that safeguards privacy, does not cause harm, is as transparent as possible, is free from bias, and is fair to all that are impacted by them. GlobalData has identified six broad key categories of issues related to responsible AI: explainability; bias; ethics; hallucinations, toxicity and poisoning; data privacy and data leakage; and copyright infringement.
Challenges related to responsible AI have existed for years, but they have grown in number with the launch of GenAI and ChatGPT, and have also become more pressing and more public. Organizations deploying AI must ensure that they are using the technology in a way that is responsible and ethical, otherwise they risk significant damage to their brand reputation, if not legal and financial repercussions. It is a highly ambitious goal, and getting there is a daunting task.
