Large language models and generative AI
Since the release of OpenAI’s ChatGPT in November 2022, there has been widespread public awareness of Large Language Models (“LLMs”). In a report by McKinsey,[1] conducted only one month after the release of ChatGPT, 79% of a wide pool of respondents said they’d had some kind of exposure to Generative AI.
However, with the emergence of LLMs, there has been a corresponding increase in public awareness of the risks relating to Generative AI. Authorities and private parties have responded with attempts to set out guardrails that are relevant to the developing technology. The majority of private actions have been brought in relation to copyright infringements, although there have also been actions which consider potential anti-competitive conduct through the use of LLMs, demonstrating the overlap in these fields and the market effects which can arise from new technology.
In this article, we provide an overview of LLMs, and the development of regulation in the UK, US, and EU, both through public and private enforcement, and examine the complementary role these approaches can have in a fast-developing field.
Background to LLMs
LLMs are built upon “Transformer Technology”, a machine-learning architecture created by Google researchers in 2017[2] and the fundamental foundation of all LLMs. LLMs were developed to solve the issue of “sequence transduction”, which simply means transforming an input sequence into an output sequence. It enables the language model to predict the next word or sub-word, called a “token”, based on huge amounts of text the model observes. Within transformer technology, “neural networks” are responsible for remembering their training and determining the relationship between inputs in a sequence through their “chain-like” architecture[3] – for instance, the relationship between the words “sky” and “cloud”. As such, an LLM will use all its “input tokens” to predict and generate human-like responses.
The launch of OpenAI’s ChatGPT sent the global tech world into a race to release their own versions of LLM technology. OpenAI’s biggest rival is Google’s Bard, launched in March 2023. Prior to September 2023, ChatGPT operated with a cutoff date of September 2021 for the data on which it was trained.[4] Unlike ChatGPT, Bard has, from its inception, been able to search the web in real-time and, in theory, pick up on a one-day-old Wikipedia page when operating. Meta released an open-source version of its LLM, LLaMA 2, for free commercial and research use in July 2023.
The datasets used to train LLMs are confidential and essential to the LLMs’ development. Even so, one of the first legal challenges faced by LLMs swiftly emerged on widespread use, when it became apparent that the LLMs could reproduce copyrighted works with great accuracy, raising questions about the legal basis for the use of those materials in the training process. These questions encompassed both the input of the LLMs (i.e. were models trained using copyrighted content), and the output of the LLMs (i.e. could an LLM reproduce copyrighted content). To illustrate the extent of the issue, in September 2023, The Atlantic published an article that referenced a database that authors could search to find out whether their books were included in “Books3” – a vast collection comprising 170,000 books that were allegedly used to train LLaMA, among other LLMs, without the authorisation or knowledge of their authors.[5] Some expressed concerns that tech companies might have engaged in huge-scale web scraping exercises to train their LLMs without implementing necessary controls to guard against breach of copyright, or, for that matter, the infringement of any other rights relevant to the material used for training said models.
The issue is not confined to text generation. LLMs are a type of “Generative AI”,[6] automated models that generate text, images, music, audio, and videos. The text-based AI models such as ChatGPT are the most well-known, but there are several products on the market that generate other types of content. For example, the AI model Midjourney takes a descriptive text prompt and suggests several images that match the criteria but with varying artistic styles. This model was developed using the typical training process for Generative AI – ingesting a huge scale dataset, the “LAION-5B”, consisting of 5.85 billion images,[7] of which a large amount is thought to be copyrighted. As a potential solution to this challenge, another tech player, Stability AI, began asking artists to opt out of the next iteration of its image generator platform. This approach attempts to shift the onus onto artists to proactively safeguard their intellectual property.
Given Generative AI covers text, audio, video, and visual content, the categories of impacted persons are wide - encompassing content creators such as authors, visual artists, musicians, and actors, but also potentially data subjects and individuals, as explored further below.
Developing case law
There have been several class action filings in the US alleging copyright infringement based on the unauthorised copying of written works as training material for LLMs (“input infringement”) and the output of allegedly infringing derivative works by Generative AI chatbots (“output infringement”).
Whether unlawful infringement is occurring in Generative AI inputs or outputs turns largely on the doctrine of fair use under US copyright law. This doctrine permits the unauthorised use of copyright materials for the limited purposes of criticism, comment, news reporting, teaching, scholarship, or research. Pursuant to 17 U.S.C. § 107, Courts consider four factors:
- The purpose and character of the use, including whether it is commercial, transformative, and non-expressive,
- The nature of the copyrighted work,
- The amount and substantiality of the portion used in relation to the copyrighted work as a whole, and,
- the effect of the use upon the potential market for the copyrighted work.
The first factor—decisive in the Generative AI context—hinges on two considerations: commerciality and transformativeness. Commerciality asks the straightforward question: was the use for profit? Transformativeness, on the other hand, is not straightforward and asks whether copyrighted content has been built upon (essentially, transformed to such an extent) that the work now serves a different purpose and does not infringe on copyright. This is nuanced, case-specific, and often unpredictable from an ex-ante litigation perspective.
As the US Supreme Court recently explained, the “central question of the first fair use factor is “whether the new work merely supersedes the objects of the original creation (supplanting the original), or instead adds something new, with a further purpose or different character.”[8] The greater the difference in purpose and character, the more likely a finding of fair use; “[t]he smaller the difference, the less likely.”[9] For example, a parody does not supplant an original work because it fulfils a distinct purpose and complements rather than competes with the original. A summary, however, may constitute a substitute product in the same market as the original, where it sufficiently reproduces the creative elements of the original and serves the same purpose. Because transformativeness is a question of degree, it is case-specific. As a result, it is unlikely that fair use will be applied on an industry-wide basis for all LLMs and chatbots or for all their specific applications.
Nevertheless, generative AI platforms have claimed that the fair use doctrine permits them to freely use copyrighted works scraped from the internet for the purpose of training LLMs to form predictive associations and generate human-like expressive responses. The platforms contend they copy written works to identify patterns in language and study functionality. Thus, in their view, scraping is merely an intermediate step in the process of “teaching” LLMs how to generate entirely new written works. In this, the industry relies on a line of cases addressing “intermediate copying.” In Sega Enterprises Ltd. v. Accolade, Inc., 977 F.2d 1510 (9th Cir. 1992), the defendant copied Sega’s copyrighted software to identify the functional requirements to make games compatible with Sega’s gaming console. Similarly, in Sony Computer Entertainment Inc. v. Connectix Corp., 203 F.3d 596 (9th Cir. 2000), the defendant reverse-engineered a copy of Sony’s software to create a new gaming platform on which customers could play games designed for Sony’s gaming system. In both cases, the U.S. Court of Appeals for the Ninth Circuit (with jurisdiction over Silicon Valley, San Francisco, and Seattle) held this was fair use because the defendants copied material to extract uncopyrightable information for the purpose of developing a different product. Similarly, in the landmark case Authors Guild v. Google, Inc., 804 F.3d 202, 213–14 (2d Cir. 2015), the U.S. Court of Appeals for the Second Circuit (covering New York) held that Google’s copying of works for Google Books was fair use because it was essential to indexing digital copies and rendering them searchable online.
The contrary view asserts that LLM training does not simply extract information about copyrighted works (e.g., rendering books searchable or analysing the frequency of terms used by an author). Instead, it extracts the expressive elements of works to ‘teach’ the model how to reproduce or simulate such expression. Thus, the argument goes, Sega and Sony’s games were not copied to write new games, and Google Books did not use its digital scans to write new books, so the outputs in these cases were not market substitutes; unlike the case with Generative AI.
This distinction between expressive and non-expressive intermediate use appears to be the principal battleground in the current crop of copyright litigation involving Generative AI. Recently, a federal court in Delaware became the first court to address the issue. In Thomson Reuters v. Ross Intelligence,[10] the owner of Westlaw, a legal research platform and database, brought copyright infringement claims against AI start-up Ross Intelligence, arguing that Ross used Westlaw headnotes (proprietary summaries of cases prepared by Westlaw staff) to train Ross’s AI model to generate competing content. Ross claimed this was transformative fair use because the copying was an intermediate step, required to study language patterns to produce a new work. Ultimately, the court refused to grant summary judgment for Ross, holding that there was a disputed factual question: had Ross used the copied headnotes to replicate creative expression or simply to analyse language patterns?
Although the Thomson Reuters court did not reject (or adopt) the fair use defence, this case is an important precedent for the growing number of AI copyright cases pending in the US. If the decision is a bellwether, the ultimate question of fair use may need to be resolved by juries on a case-by-case basis and will hinge on the training process and the potential purpose and use of the outputs.
With this in mind, we will look further into the class action complaints and motions to dismiss filed in the Tremblay v Open AI and Kadrey v Meta cases.
In June 2023, Paul Tremblay and Mona Awad were the first to file a class action against OpenAI in the Northern District of California.[11] The complaint questions the legitimacy of OpenAI’s LLM and claims that the plaintiffs’ books were copied without consent or fair compensation. A similar lawsuit by comedian Sarah Silverman, Richard Kadrey, and Christopher Golden followed with two separate copyright infringement lawsuits against Meta and OpenAI filed in July 2023 in the Northern District of California.[12]
In the case against OpenAI, Tremblay questions OpenAI’s model and claims that copyrighted books were used as training material without authorization or fair compensation. The lawsuit explains that in 2018, OpenAI revealed that it trained its first Generative AI model called GPT-1 on a library of “7,000 unique unpublished books” called “BooksCorpus”. When introducing GPT-3 in July 2020,[13] OpenAI disclosed that 15% of its dataset originated from “two internet-based books corpora” that they called “Books1” and “Books2” – that allegedly contain over 350,000 titles combined. The content of these datasets has not been disclosed, and OpenAI’s latest model, GPT-4, was released with no information about its training datasets at all. Tremblay claims that ChatGPT generates "very accurate" summaries of copyrighted books, which suggests that the books were inputted into the database. Because copyrighted works were used to train the AI model, the plaintiffs argue that every output generated by the model is an infringing derivative work in violation of the Copyright Act. The plaintiffs seek compensation for their contributions to the AI models and seek a permanent injunction, actual damages, and restitution of profits. OpenAI filed a motion to dismiss the case, claiming its conduct falls under fair use protections. OpenAI stated that the authors “misconceive the scope of copyright, failing to take into account the limitations and exceptions (including fair use) that properly leave room for innovations like the large language models now at the forefront of artificial intelligence.”[14]
In the Kadrey v Meta[15] case, the authors claim that their books appeared in the LLaMA system dataset. The plaintiffs referred to the training data and detailed table that Meta released in February 2023 that outlined the LLaMA training dataset, which included “the Books3 section of ThePile… a publicly available dataset for training large language models”.[16] The complaint describes this as an illegal shadow library.[17]
The plaintiffs have found their works in the “Books3” dataset and allege direct copyright infringement, vicarious copyright infringement, unfair competition, negligence, and unjust enrichment. In response, Meta advanced very similar arguments to OpenAI and stated that the authors had failed to argue that the LLM’s output was substantially like their works, which they claimed was a “basic” element of copyright infringement. Meta further contended that "Plaintiffs advance the fallacy that every output generated using LLaMA is based on expressive information extracted from Plaintiffs’ books and therefore an infringing derivative work of each of those books," Meta said, "The Ninth Circuit has rejected this argument as frivolous, and it makes no sense." Meta also claims that the authors’ books made up “less than a millionth” of the material used to train their LLaMA model. In November 2023, the Court made an order dismissing all the claims, with the exception of the claim of direct copyright infringement, essentially the input infringement, which alleges that Meta copied the plaintiffs’ books to train the LLaMA model.[18] The order dismissed the allegations of output infringement on the basis that the complaint offers no allegations “of the contents of any output, let alone of one that could be understood as recasting, transforming, or adapting the plaintiffs’ books”.[19]
In another case, Authors Guild v. OpenAI,[20] similar arguments to those in the Tremblay and Kadrey cases are in issue, but this case puts a stronger emphasis on the alleged harm that is occurring in the fiction market. In this complaint, one of the plaintiffs – Jane Friedman – claims to have found several AI-generated books on Amazon that wrongfully list her as an author.
There are also cases in which the legal grounds go beyond copyright issues. In T. v. OpenAI LP[21]- filed against OpenAI and Microsoft in September 2023 in the US District Court for the Northern District of California – two unnamed software engineers allege privacy law violations in the development of ChatGPT and other generative artificial intelligence systems. The complaint accuses the companies of training their fast-growing AI technology on “stolen data” from hundreds of millions of internet users through social media platforms and other sites. The complaint alleges that OpenAI used at least five distinct datasets to train ChatGPT, including a non-profit dataset called “Common Crawl” containing massive amounts of personal data not intended for such use for their own commercial benefit. The filing states, “OpenAI is now worth around $29B, yet the individuals and companies that produced the data it scraped from the internet have not been compensated.”[22] Plaintiffs claim that “once trained on stolen data, Defendants saw the immediate profit potential and rushed the Products to market without implementing proper safeguards or controls to ensure that they would not produce or support harmful or malicious content and conduct that could further violate the law, infringe rights, and endanger lives.”[23]
In a sign that cases relating to LLMs in the US are not slowing down, on 27 December 2023, The New York Times filed a lawsuit against Open AI and Microsoft for copyright infringement.[24] The case relates to ChatGPT and Microsoft’s “Bing Chat” products and alleges that the LLMs were trained on millions of copyrighted articles, in-depth investigations, opinion pieces and more, all originating from The New York Times. The case alleges that the Defendants “gave Times content particular emphasis when building their LLMs—revealing a preference that recognizes the value of those works.”[25] This case also illustrates the cross-over between copyright and competition law as it alleges that The Times’ work has been used to create competing AI products. The lawsuit states that Bing’s search index copies and categorises The Times’ content to generate excerpts and detailed summaries, which “undermine and damage The Times’s relationship with its readers and deprive The Times of subscription, licensing, advertising, and affiliate revenue.”[26] The lawsuit states that Microsoft and OpenAI owe “billions of dollars” in damages.
In another complaint filed on behalf of news publishers in December 2023, Helena World Chronicle LLC v Google LLC,[27] Google’s LLMs are examined in a broader context. The class action complaint alleges anti-competitive conduct by Google in violation of sections 1, 2, and 3 of the Sherman Act and section 7 of the Clayton Act. The complaint alleges that Google abused its dominant position and misappropriated publishers’ content to publish its own news and reference content via different mechanisms, such as its “Knowledge Panel” which is summary content that is displayed to the right of users’ search results, and its “Featured Snippets” which is a summary displayed as a box on top of search results. The complaint alleges that Google’s AI technologies have exacerbated this conduct, with Google’s new Search Generative Experience product providing headline news from different publishers “making it a one-stop shop”,[28] along with ‘input’ violations in a competition law context, arising from the training of Google’s Bard chatbot using news publishers’ content.
A final point of note is the adoption of multi-jurisdictional strategies in private actions. Getty Images has filed a complaint in both the US and UK against Stability AI in relation to allegations that Stability AI trained its image generating AI models with millions of Getty’s copyrighted images.[29] The UK case faced a jurisdictional challenge with Stability AI arguing that the models had been entirely trained overseas and, therefore, could not constitute a copyright infringement in the UK. In December 2023, the case survived this challenge with the UK’s High Court finding that the claim should proceed.[30] This demonstrates that in the right circumstances, for example where harm has been suffered by UK or European market participants in the relevant jurisdiction – potential claimants may be able to avail themselves of a multi-jurisdictional approach – even against the current players, who are principally based in the US.
Developing Regulation
While private enforcement in this sector is developing principally in the US, public enforcement authorities have been reacting to the advances in Generative AI on both sides of the Atlantic.
EU
As part of its digital strategy, the EU is implementing AI regulation[31] to ensure better conditions for the development and use of this innovative technology. In April 2021, the European Commission proposed the first EU regulatory framework, the Artificial Intelligence Act. In December 2023, a final agreement on the text of the regulation was reached, and the proposed law now needs approval from the EU Parliament and Council before it can be officially implemented as EU law.
The proposed European legislation sets out a regulatory framework based on a risk-based approach, which defines four levels of risk in AI:[32]
- Unacceptable risk: systems that are considered a threat to people and will be banned.
- High risk: systems that negatively affect safety or fundamental rights and will be assessed before being put on the market.
- Limited risk: systems that pose a limited risk (i.e. chatbots) and will be subject to transparency obligations.
- Low and minimal risk: all other AI systems which will not be subject to legal obligations, although the AI Act envisages the creation of a voluntary code of conduct.
The higher the classification of an AI system, the more stringent regulations will apply. Under the proposed EU AI Act, the risk level of each AI system will need to be assessed. The European Parliament has said that non-compliance with the Act will result in fines amounting to between 1.5-7% of total global annual turnover.
Under the EU law, Generative AI products will be required to meet transparency obligations. Companies that distribute Generative AI products will need to publish detailed summaries of the content used for training their models and will need to comply with EU copyright law. Generative AI products will also need to be designed to comply with the law and prevent the generation of illegal content.
UK
While the UK is also moving forward in steps to regulate AI, the UK Government has endorsed a ‘light touch’ approach to AI regulation. In March 2023, the UK Government published its policy paper “A Pro-Innovation Approach to AI Regulation”, which was open for consultation until June 2023 (with the Government currently analysing the responses). The Government set out five principles that will underpin the UK’s AI regulatory approach. These are:
- Safety, security, and robustness
- Appropriate transparency and explainability
- Fairness
- Accountability and governance
- Contestability and redress
The policy paper outlined various current legal regimes through which AI could be held legally accountable for a violation of the law and which may provide avenues for redress. These include tort law, where a civil wrong has caused harm; data protection laws, which require personal data to be processed fairly; product safety laws, which require goods placed on the market to be safe; and consumer laws, which could govern sales contracts for AI-based products and services. The report also notes, however, that it is not yet clear whether consumer rights law will provide the right level of protection for integrated AI or AI services, or how tort law would fill any consumer law gaps. It remains an open question whether current legal regimes are fit for purpose, and the Government expects existing regulators to explain current routes to contestability and redress under their regimes.
At this stage, the Government does not plan to enact new legislation, stating that implementing rigid and onerous requirements could hold back AI innovation, but plans to issue its five principles to UK regulators on a non-statutory basis to be implemented by existing regulators. The UK Government’s open approach was reflected in the AI Safety Summit held on 1-2 November 2023 at Bletchley Park, where Prime Minister Rishi Sunak highlighted both the opportunities of “the greatest breakthrough of our time” and the need to give people “the peace of mind that we will keep them safe”. The Prime Minister announced the establishment of an AI Safety Institute, which aims to develop infrastructure to understand the risks of AI and enable its governance. In this regard, it was also announced that the UK had entered a "landmark agreement" with tech companies to test their models before their release. The tech companies have pledged to provide priority access to the Institute to undertake safety evaluations.
USA
The USA is following a similar approach to the UK with several policy initiatives that aim to guide the design and development of AI. In October 2022, the White House released a “Blueprint for an AI Bill of Rights,”[33] which outlined five principles that are framed as protections that the public should be entitled to in relation to AI:
- Safe and Effective Systems
- Algorithmic Discrimination Protections
- Data Privacy
- Notice and Explanation
- Human Alternatives, Consideration and Fallback
The Blueprint was accompanied by a technical guide to assist industry and government in implementing the principles.
In January 2023, the US Department of Commerce’s National Institute of Standards and Technology released an AI Risk Management Framework,[34] and the National Artificial Intelligence Research Resource (NAIRR) Task Force released its final report entitled “Strengthening and Democratizing the U.S Artificial Intelligence Innovation Ecosystem.”[35]
The AI Risk Management Framework aims to mitigate risks for AI technologies. It was developed in collaboration with the private sector as a voluntary resource for organisations designing AI systems. Complementary to this is the proposed Research Resource in the NAIRR’s report, which would bring together a large range of tools (including data). The aim of the Research Resource is to democratise AI research and development. Such a resource, if implemented, could be effective in widening the number of organisations developing AI products, and increase competition with Big Tech products.
More recently, the US has increased its focus on monitoring AI, with President Biden signing an Executive Order on 30 October 2023, requiring tech companies to share test results for their AI systems with the government before these systems are released. The Order applies to AI models that pose a threat to national security, economic security or health and safety, and the Government will develop the testing guidelines. This announcement was followed by Vice President Harris’ announcement at the UK’s November 2023 Safety Summit creating a US AI Safety Institute, alongside the UK initiative, to test the most advanced AI models.
Outside of direct AI regulation, is it worth noting that other legislative measures have sought to meet the challenge posed by AI, and the enhancement it brings to tech business models. For example, the proposed Journalism Competition and Preservation Act (JCPA), introduced in congress in 2021, sought to introduce joint bargaining rights for news publishers to band together and negotiate deals with tech platforms for the use of their content. In this respect, the JCPA is similar to Australia’s 2021 News Media Bargaining Code, and Canada’s 2023 Online News Act. The JCPA’s progress has stalled since being placed on the Senate Legislative Calendar in 2022. However, the JCPA (and similar legislation in other jurisdictions) demonstrates the role of legislation in supporting regulatory initiatives, as where affected parties, such as news publishers, are able to negotiate for the use of their content under legislative frameworks, the impact of AI and LLMs could be mitigated.
What comes next?
The US proceedings outlined above are likely to be important precedents, particularly as to the approach the courts take to the fair use doctrine in the training of LLMs, and the extent to which privacy rights apply to future Generative AI development. The progress of these cases may serve to highlight the areas of uncertainty and the robustness of the guardrails currently in place to protect the rights of affected parties and market participants. The speed at which private actions have been brought in the US signals that private enforcers are not waiting for Governments and Regulators to step in but are playing an active role in challenging these technologies. It is likely that private actions will play a similarly complementary role in other jurisdictions, and the high-level approach taken to regulation in jurisdictions such as the UK arguably leaves a wide latitude for private enforcement to complement public enforcement, and to play a role in clarifying the application of the law and setting out specific standards in practice to protect the rights of affected parties.
The legal challenges faced by Generative AI are unlikely to be restricted to those that have surfaced to date. For example, given the opacity of the marketplace, it is not yet clear whether any one or more tech companies will come to dominate the Generative AI market on an ongoing basis. Will this technology result in one player initially capturing the market with innovative and new tech, only to hinder the development of new and smaller players? Relatedly, will players use Generative AI capabilities to attempt to monopolise or abuse their dominance in other markets? The competition risks associated with the Generative AI market include high barriers to entry due to the sheer scale of the data and computing required to train and run an LLM. The vertical integration of tech giants who operate in this space may also lead to anti-competitive leveraging and discrimination.[36]
In the UK and US, public regulation is proceeding at a measured pace in contrast to the EU, which has developed specific regulation. In the US, private enforcement initiatives are gathering momentum, whereas in the UK and Europe, private enforcement has yet to have the same material impact. However, looking to the future, the legislative context in the UK and Europe, in particular with differing competition, copyright, data protection and privacy rights, may provide fertile ground for development of guardrails through private enforcement. While those actions may take more time to emerge, they may yet be instrumental in establishing the legal basis for accountability in this rapidly developing field, and achieving the optimistic vision outlined by Rishi Sunak at Bletchley Park: that “safely harnessing this technology could eclipse anything we have ever known.”[37]
*Scott Gilmore is a Partner in Washington D.C., Luke Streatfeild is a Partner in London, and Shahrina Quader is an Associate in London
Footnotes
[1] McKinsey, “The State of AI in 2023: Generative AI’s Breakout Year”, 1 August 2023.
[2] Google, “Transformer: A Novel Neural Network Architecture for Language Understanding”, 31 August 2017.
[3] Id. at “Accuracy and Efficiency in Language Understanding”.
[4] BBC News, “Chat GPT Can Now Access Up to Date Information”, 27 September 2023.
[5] The Atlantic, “Revealed: The Authors Whose Pirated Books Are Powering Generative AI”, 19 August 2023.
[6] The Financial Times, “Generative AI Exists Because of the Transformer”, 12 September 2023.
[7] Christoph Schuhmann, Romain Beaumont, Richard Vencu, Cade Gordon, Ross Wightman, Mehdi Cherti, Theo Coombes, Aarush Katta, Clayton Mullis, Mitchell Wortsman, Patrick Schramowski, Srivatsa Kundurthy, Katherine Crowson, Ludwig Schmidt, Robert Kaczmarczyk, Jenia Jitsev, “LAION-5B: An Open Large-Scale Dataset for Training Next Generation Image-Text Models”, 16 October 2022.
[8] Andy Warhol Found. for the Visual Arts, Inc. v. Goldsmith (2023) 143 S. Ct. 1258, 1274.
[9] Ibid.
[10] Thomson Reuters v. Ross Intelligence (2023) United States District Court for the District of Delaware, 1:20-cv-613-SB.
[11] Tremblay et al v. OpenAI Inc. et al (2023) United States District Court for the Northern District of California, 3:23-cv-03223.
[12] Silverman v. OpenAI, Inc. (2023) United States District Court for the Northern District of California, 3:23-cv-03416 and Kadrey v. Meta Platforms Inc. (2023) United States District Court for the District of Northern California, 3:23-cv-03417.
[13] Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J.D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A. and Agarwal, S., “Language Models are Few-Shot Learners”, Advances in Neural Information Processing Systems, vol. 33, pp. 1877-1901.
[14] National Law Review, “Meta and OpenAI File Motion to Dismiss Sara Silverman Copyright Infringement Case”, 21 September 2023.
[15] Kadrey v. Meta Platforms Inc. (2023) United States District Court for the District of Northern California, 3:23-cv-03417.
[16] Ibid 4.
[17] Ibid 6.
[18] Kadrey v. Meta Platforms Inc. (2023) United States District Court for the District of Northern California, 3:23-cv-03417.
[19] Ibid.
[20] Authors Guild v. OpenAI Inc. (2023) United States District Court for the District of Southern New York, 1:23-cv-08292.
[21] T. v. OpenAI LP (2023) United States District Court for the District of Northern California, 3:23-cv-04557.
[22] Ibid 29.
[23] Ibid 3.
[24] The New York Times Company v Microsoft Corporation (2023), United States District Court for the Southern District of New York, 1:23-cv-11195.
[25] Ibid 2.
[26] Ibid 3.
[27] Helena World Chronicle v Google LLC (2023), 1:23-cv-03677. This complaint was filed by Hausfeld’s US office.
[28] Ibid 75.
[29] Getty Images (US), Inc v Stability AI, Inc (2023), 1:23-cv-00135; Getty Images (US) Inc & Ors v Stability AI Ltd (2023), IL-2023-000007.
[30] Getty Images (US) Inc & Ors v Stability AI Ltd (2023) EWHC 3090 (Ch).
[31] European Parliament, EU AI Act: First regulation on Artificial Intelligence, 8 June 2023.
[32] Ibid.
[33] The White House, “Blueprint for an AI Bill of Rights: A Vision for Protecting Our Civil Rights in the Algorithmic Age”, 4 October 2022.
[34] National Artificial Intelligence Research Resource Task Force, Strengthening and Democratizing the U.S. Artificial Intelligence Innovation Ecosystem: An Implementation Plan for a National Artificial Intelligence Research Resource, January 2023.
[35] National Institute of Standards and Technology, Artificial Intelligence Risk Management Framework (AI RMF 1.0) (nist.gov), January 2023.
[36] See, Thomas Hoppner, Luke Streatfeild, “ChatGPT, Bard & Co: an introduction to AI for competition and regulatory lawyers”, Hausfeld Competition Bulletin, 23 February 2023.
[37] Gov UK, “Prime Minister's speech at the AI Safety Summit”, 2 November 2023.