Key Trends and Forecasts Influencing the GenAI Market in 2024

Building on the article surrounding the copyright debate in GenAI, which delves into the challenges and expert perspectives, this blog post continues to explore the evolving landscape through predictive insights. As we look ahead, the interplay between technology, law, and ethics becomes increasingly complex, prompting a closer examination of potential legal frameworks, and market shifts: 

Court Decisions are to Shape AI Copyright Laws

The ongoing debates over copyright for AI-generated content are rich with cases that may set crucial precedents that will shape the legal framework for AI-generated content. Below are just a few recent cases that encapsulate the current legal challenges:

Getty Images v. Stability AI: Getty Images has sued Stability AI in both the US and the UK for allegedly using over 12 million of Getty’s images without permission to train their AI models. This ongoing case is pivotal as it addresses the legality of using copyrighted content for training GenAI without authorization.

Tremblay v. OpenAI: Authors have accused OpenAI of copyright infringement for allegedly using their works without permission to train ChatGPT. This lawsuit highlights the complex legal territory concerning the training of AI models with potentially copyrighted material.

Nazemian v. Nvidia: This case involves a direct copyright infringement claim against Nvidia for including copyrighted works in the training dataset of its NeMo Megatron LLM series. The case revolves around the use of the “The Pile” dataset, which allegedly includes copyrighted material.

In addition to the disputes mentioned above, there are also other high-profile cases like issues involving Midjourney, Stability AI, and DeviantArt, as well as the case between OpenAI and the New York Times that we previously discussed and remain unresolved. The outcomes of these cases could significantly influence the regulatory framework and operational strategies of AI companies moving forward.

Companies are to License Content to Train AI

Companies are likely to assess the risks associated with using scraped content, potentially leading to a shift towards licensing content for training purposes. By partnering with established content providers and obtaining proper licenses, AI companies can mitigate legal risks while accessing high-quality datasets for training their models. Such partnerships underscore the importance of adopting legal and ethical practices in the development of AI technologies.

Recent partnerships, such as the collaboration between BRIA and Getty Images or OpenAI with Le Monde and Prisa exemplify this trend. There are also other examples, reflecting a broader shift in the industry where companies proactively engage in agreements, potentially influencing future legal interpretations and regulations regarding AI-generated content:

OpenAI and Axel Springer: OpenAI entered into a licensing agreement with Axel Springer, a large media company that owns several prominent publications including Business Insider and Politico. This deal allows OpenAI to use Axel Springer’s content to train its generative AI models and integrate news stories into responses provided by its AI-powered chatbot, ChatGPT. This arrangement includes financial compensation for Axel Springer and helps OpenAI enhance the relevance and accuracy of its AI applications by using high-quality, licensed content.

Apple and Shutterstock: Apple has secured a deal with Shutterstock to license millions of images for AI training. This move is part of Apple’s broader strategy to enhance its AI capabilities across its product lines, including the iPhone and iPad. By licensing images from Shutterstock, Apple ensures that its AI models are trained on legally obtained and diverse visual content, which is crucial for the development of accurate and robust AI-driven features.

Reddit and an Unnamed AI Company: Reddit has reportedly signed a $60 million annual contract with an unnamed major AI company. This deal allows the AI company to use Reddit’s user-generated content to train its models. Such agreements highlight the growing importance of social media data in AI development and the need for platforms to monetize their user-generated content while ensuring compliance with copyright norms. However, since Reddit does not compensate content creators, ethically this raises questions about the ownership of the content and whether it is fair to license it without sharing profits with the creators.

Given the substantial investments in acquiring legally compliant datasets, it’s evident that companies are preparing for a scenario where courts may increasingly rule in favor of authors. Such strategic moves indicate a proactive approach by companies to align with potential legal outcomes that could enforce stricter copyright rules. The emergence of a dataset market is a logical outcome of this movement, suggesting a shift towards more responsible use of data in AI development. 

Dmitry Shironosov, CEO of Everypixel

Emergence of Dataset Market

As AI companies recognize the importance of obtaining legally clean data for training their models, the demand for high-quality datasets has surged. This upswing is reflected by recent licensing deals by major AI companies such as OpenAI and Apple. While definitive court rulings are yet to establish a consistent legal framework regarding data use in AI, there is growing anticipation that future judicial decisions could likely favor content authors, emphasizing the importance of obtaining proper licenses for training data. Companies now may explore the option of purchasing datasets from specialized providers to mitigate legal risks and ensure compliance with copyright laws.

vAIsual: vAIsual is a company that exemplifies this trend. They have established themselves as a leader in the dataset marketplace for the AI industry, providing legally clean datasets tailored for AI training. From its inception, vAIsual has been committed to delivering ready-made datasets that save time and mitigate legal risks for AI innovators. Their custom dataset services cater to global companies seeking high-quality data for training their AI models, highlighting the growing demand for legally compliant datasets in the AI ecosystem.

Adding evidence of this emerging market and the need for data, take the research published by Jared Kaplan. Kaplan’s findings illustrate that as AI models are trained with increasingly larger datasets, their ability to accurately interpret and generate human-like content improves significantly. This “scale is all you need” approach leads companies to seek out expansive and, also important, legally clean datasets to improve their AI models and meet the expected legal regulations.

According to a report by Straits Research, the AI Training Dataset Market is projected to reach USD 7.23 billion by 2030, growing at a compound annual growth rate of 20.8% from 2022 to 2030. This growth is not only fueled by increasing dataset sizes but also by the diversification of dataset types. Historically, specific datasets — particularly those for sectors like healthcare and manufacturing which require data that cannot be easily gathered from public domains — have dominated the market. However, with the rising need for generative content for AI applications, such datasets are expected to become an integral part of this expanding market.

Additional Copyright Protection Mechanisms for Creators

The landscape for creators could shift with more clarity and articulation around opting out and the emergence of additional copyright protection mechanisms, such as the growing use of data poisoning. It is worth mentioning that many artists initially reacted with skepticism and resistance against AI-generated content. This was vividly expressed in various movements and public outcries, such as the artist-led revolt against AI-generated art on platforms like ArtStation, where creators felt their rights and contributions were being undervalued and exploited without proper attribution or compensation. This wave of dissent has since evolved into a more structured and practical approach towards protecting artists’ rights. Creators and legal experts are now focusing on developing and advocating for mechanisms that not only allow artists to opt out of having their work used by AI without their consent but also on solutions like data poisoning to protect the original work from unauthorized use:

Nightshade Tool for Artists: Artists are increasingly leveraging tools like Nightshade to introduce “poisoned” data into the training sets of AI models. This data corrupts the AI’s learning process, potentially leading to malfunctioning models when they train on these poisoned datasets. Such techniques are being used as a method for artists to protest and protect their works from being used without permission by AI companies. The disruptions caused by these poisoned inputs force AI systems to produce erroneous outputs, which could compel AI developers to reconsider their data-sourcing strategies and respect copyright laws more stringently.

Additionally, initiatives like explicit opt-out options implemented by major AI companies like OpenAI or Getty Images reflect a growing recognition of the importance of respecting creators’ rights and providing them with greater control over the use of their works in AI applications.

Demand for Transparency in AI Content Creation

Trust and transparency emerge as key values for clients, signaling a demand for disclosure when AI generates content, especially in editorial contexts. This trend reflects a growing demand for clear disclosure when AI is used to generate content, particularly in sensitive areas like editorial and documentary production.

One notable example comes from Zach Seward’s discussions at SXSW 2024, where he highlighted the importance of maintaining editorial integrity as news organizations increasingly experiment with AI. Seward’s approach emphasizes transparency in how AI-generated content is used and presented to the public, ensuring that audiences are aware of the nature of the content they are consuming.

Similarly, the Archival Producers Alliance advocates for clear guidelines and transparency in the use of AI, particularly concerning the use of archival material in documentaries. Their initiative supports the implementation of rules that govern the ethical use of such content, promoting transparency to maintain the trust of viewers and respect the origins and authenticity of historical materials.

In addition to these initiatives, the emergence of standards like the Content Authenticity Initiative (CAI) and the Coalition for Content Provenance and Authenticity (C2PA) plays a crucial role. These organizations are developing technologies and standards to enable better transparency regarding the origin and history of digital content. By supporting the use of digital provenance, these efforts aim to combat misinformation and ensure that consumers can verify the authenticity and integrity of content.

While there are no universally established rules yet, these companies are taking proactive steps to formulate and adhere to internal guidelines that ensure transparency and ethical use of AI in their content creation processes. In one of our previous articles, we recommended several practices on how to use AI in marketing activities and design projects, including:

  • Staying informed about the legal landscape
  • Always adapting AI outputs
  • Don’t share confidential information with AI
  • Disclose AI and build trust with clients

Personalized Marketing Powered by AI

Personalization stands out as a driver for marketing and advertising agencies. A trend may emerge where companies leverage their product content to craft brand-specific AI-generated content, enhancing the personalized touch in marketing strategies. Here are some cases and strategies that highlight this movement:

Adobe’s Content Supply Chain Solution: Adobe has developed a Content Supply Chain solution that deeply integrates generative AI capabilities to streamline content creation across brands. This platform allows brands to generate personalized marketing copy and other content types that align perfectly with their digital strategies, ensuring that all communications are consistently branded across various channels. This system is designed to significantly speed up content creation while allowing for high degrees of personalization.

Typeface Hub’s Multimodal AI: Typeface Hub has enhanced its multimodal AI capabilities to offer greater control over brand-specific content generation. Their updated Brand Kit automatically adjusts to a brand’s unique voice and style, ensuring that all content, whether text or images, adheres strictly to predefined brand guidelines. This is particularly useful for maintaining consistent branding in large-scale digital marketing campaigns.

Flair.AI: An AI design tool for product photoshoots, visualizations, and animations. FlairAI is designed specifically for marketers and brands to generate brand-specific content. This tool enables brands to create content that is not only consistent with their identity but also adjusted to meet the unique preferences of their audience.

In conclusion, it is clear that the intersection of technology, legal frameworks, and ethical considerations will continue to shape the future of creative content. Companies, creators, and legislators should collaborate to foster an environment that respects copyright integrity while promoting innovation. The ongoing debates and emerging trends will undoubtedly influence the strategies and policies that will govern the technological advancements in AI.

Spread the word