Top AI News, April 2024

In this monthly roundup, we spotlight the top AI news stories from April:

Adobe Buys Videos for its AI

Adobe is actively purchasing videos to develop its AI text-to-video generator. The company is encouraging its network of photographers and artists to submit videos depicting everyday actions and emotions, offering compensation averaging $2.62 per minute of footage. Additionally, Adobe is exploring partnerships with third-party AI providers such as Runway, Pika Labs, and potentially OpenAI’s Sora models. 

The context behind: This increased interest of the company in buying videos from photographers and artists showcases the recent trend where companies are relying on licensed content to train AI models. By obtaining proper licenses, companies can mitigate legal risks while accessing high-quality datasets for training their models.

Apart from that, Adobe is also set to introduce AI video tools to its Premiere Pro editing platform, with plans to integrate its own generative AI video model into the Firefly family. These tools, which include the ability to generate and manipulate video content using text prompts, aim to enhance the editing experience for users. 

Ethical Concerns Arise Over Adobe Firefly’s Training Data

Adobe’s image-generating software, Firefly, touted for its ethical training data practices, has stirred controversy after revelations that it was trained using images from Midjourney, among other sources.

Despite Adobe’s initial claims that Firefly relied mainly on licensed images from Adobe Stock, it appears that AI-generated content, including that of its competitors, contributed to Firefly’s training. Adobe Stock is one of the few stock image platforms that accept content generated in third-party services. Consequently, since Adobe trains its algorithm on its content, the inclusion of third-party-generated content in Adobe Stock inadvertently contributes to the training data for tools like Firefly.

However, despite the revelation, Adobe claims that it controls the quality of its dataset:

“Every image submitted to Adobe Stock, including a very small subset of images generated with AI, goes through a rigorous moderation process to ensure it does not include IP, trademarks, recognizable characters or logos, or reference artists’ names.”  

An Adobe spokesperson


Between the lines: This revelation challenges the narrative of Firefly as a “commercially safe” alternative, raising questions about transparency and ethical standards in the development of AI models.

Meta AI’s Worldwide Rollout

Meta AI, powered by Meta Llama 3, is expanding its reach globally with new features aimed at making everyday tasks easier and more enjoyable.

Available on Facebook, Instagram, WhatsApp, and Messenger, Meta AI is now accessible in over a dozen countries, including Australia, Canada, and Nigeria. Users can now rely on Meta AI for a wide range of tasks, from recommending restaurants with specific preferences to explaining complex concepts like hereditary traits.

Moreover, the integration of Meta AI into the Meta ecosystem, including search functionalities and image generation capabilities, enhances the user experience across platforms. With the Imagine feature, users can generate images from text in real-time, with sharper quality and the ability to include text within images.

The context behind: Against the backdrop of the ongoing AI race, it’s evident that Meta is intensifying efforts to bridge the gap with competitors and trying to establish itself as a frontrunner in the AI landscape.

Snap to Watermark AI-generated Images

Snap announced its intention to watermark AI-generated images on its platform, featuring a translucent version of its logo with a sparkle emoji. This move aims to signify images created using Snap’s AI-powered tools, enhancing transparency and safety for users.

The company clarified that removing these watermarks would violate its terms of use, although the method for detecting such removal remains undisclosed. Additionally, Snap has introduced indicators for AI-powered features and context cards for AI-generated images to provide users with more information.

Between the lines: Snap’s decision aligns with similar initiatives by tech giants like OpenAI and Meta to label AI-generated content as well as the growing trend towards transparency and content provenance.

Coca-Cola x Microsoft

The Coca-Cola Company and Microsoft have forged a five-year strategic partnership aimed at accelerating cloud and GenAI initiatives. Coca-Cola’s commitment of $1.1 billion to Microsoft’s Cloud and GenAI capabilities showcases a significant step in its ongoing technology transformation. Leveraging Microsoft Azure and AI technologies, Coca-Cola aims to revolutionize various business functions, from marketing to manufacturing and supply chain management. By migrating all applications to Microsoft Azure and exploring AI-powered digital assistants, Coca-Cola seeks to enhance customer experiences, streamline operations, foster innovation, and uncover new growth opportunities.

The context behind: Coca-Cola stands as an exemplar of how non-tech brands can harness AI to gain a competitive edge. Leveraging AI, Coca-Cola enhances supply chain management, streamlines distribution, and improves customer experiences. Moreover, Coca-Cola has recently collaborated with OpenAI to launch the Masterpiece campaign, showcasing the brand’s innovative approach to marketing.

AI in Healthcare Operations

Profluent Bio has harnessed the power of GenAI to develop a groundbreaking gene editor named OpenCRISPR-1. Utilizing their proprietary protein-designing large language model, ProGen2, the company trained on a vast database of Cas9 gene-editing proteins. This innovative approach resulted in the creation of novel gene-editing proteins capable of modifying human cells. The team also employed another AI system to generate the necessary guide RNA for precise targeting. Despite keeping the design software proprietary, Profluent has decided to release OpenCRISPR-1 to researchers, marking a significant advancement in the field of gene editing.

Moderna, a pharmaceutical and biotechnology company based in Cambridge, has teamed up with OpenAI to integrate ChatGPT Enterprise across its operations. With a focus on widespread adoption, Moderna embarked on an ambitious program to ensure proficiency in GenAI among all its employees. By fostering a culture of collective intelligence and investing in comprehensive change management initiatives, Moderna achieved impressive results, including the successful adoption of an AI chatbot tool built on OpenAI’s API, mChat, by over 80% of its workforce. Moreover, Moderna is pioneering the use of AI in clinical trial development, with innovative solutions like Dose ID, which streamlines data analysis and enhances decision-making processes.

Why it matters: These cases exemplify how AI helps to change the world, and in particular the healthcare industry, for the better.

AI Film Conference

AI on the Lot is gearing up for an AI film conference on May 16, 2024, at LA Center Studios, attracting over 500 AI enthusiasts, filmmakers, and professionals. The event promises a rich schedule of film screenings, in-depth panel discussions with leaders in the field, hands-on workshops, and live demonstrations exploring the intersection of AI and filmmaking.

AI on the Lot 2024 will showcase esteemed speakers including Katja Reitemeyer, Director of Data Science & AI at NVIDIA, Kathryn Brillhart, Virtual Production Supervisor for notable films like Fallout and Rebel Moon, and Chad Nelson, a Creative Specialist at OpenAI, among others. The conference will highlight the convergence of technology and creativity in shaping the future of entertainment. 

Our colleague and the RnD Team Lead of Everypixel Alexander Shironosov, has deepened our exploration of recent releases in AI models:

LLM:

  • Mistral – Mixture of Experts Mixtral-8x22B: A new large-scale model utilizing a Mixture of Experts architecture to enhance performance and efficiency.
  • Meta’s llama3 Release: Meta has introduced the llama3 model in two versions, 8B and 70B parameters. The 8B version performs comparably to the much larger llama2 70B models.
  • Microsoft’s Phi 3: Following the successful deployment of phi1 and phi2 in small VLMs, Microsoft has released phi3. Early metrics showcased by ShareGPT4v, trained on phi3, suggest that phi3 outperforms heavier models, indicating potential widespread use in similar applications.
  • Apple’s OpenELM Initiative: Apple has launched a family of small, open-source AI models known as OpenELM, designed for on-device applications. The models vary in size — 270 million, 450 million, 1.1 billion, and 3 billion parameters.
  • Fineweb Release: FineWeb datasets, a collection of text datasets sourced from the web (CommonCrawl), released under a permissive license (ODC-By).
  • Dolma Updates: Dolma, a dataset of 3 trillion tokens from a diverse mix of web content, academic publications, code, books, and encyclopedic materials, released its updated version.
  • Snowflake’s Arctic Base Model: Snowflake has released Snowflake Arctic and published a detailed exploration of its model, which employs a Mixture of Experts architecture, enhancing its capability to handle diverse AI tasks.
  • Startup Answer.AI’s Innovation: Answer.AI published an article and released code for their FSD/DORA methodology, which allows training the large-scale llama3 on just two video cards with 24GB of video memory each, showcasing an efficient way to manage resource-intensive AI training.

VLM:

  • InternVL 1.5: This open-source model features a robust visual encoder and has been trained on a range of image sizes, from 448×448 to 4kx4k, using a high-quality dataset. By some measures, InternVL 1.5 outperforms top commercial models like GPT-4v, Claude Opus, and Gemini 1.5 Pro.
  • New Benchmark for VLM Testing: A new version of the benchmark specifically designed for testing visual language models on images containing a large amount of text has been released. This benchmark aims to provide a more rigorous assessment of how well VLMs handle complex visual-textual interactions, which is crucial for improving their practical applications.

Video Generation:

  • Microsoft’s Talking Head Model: Microsoft has introduced a new model that generates “talking faces” videos from audio inputs and photos. Utilizing diffusion models, this new approach significantly surpasses previous methods across all major performance metrics. This release could revolutionize how dynamic video content is created from static images and sound.

Image Generation:

  • Imgsys for Text-to-Image Models: A new platform named Imgsys has been launched to facilitate pairwise comparisons and build an Elo rating for various text-to-image models. This includes checkpoints for models like SDXL and independent models comparable to Pixart-Sigma.
  • NVIDIA’s Diffusion Model Enhancements: NVIDIA has published two articles detailing methods to improve the quality of image generation using diffusion models without the need for direct model retraining. The first approach utilizes a schedule for classifier-free guidance to enhance image clarity, while the second method optimizes denoising steps to refine the output further.
  • Improved IP Adapter for Portrait Generation: An enhanced IP adapter has been developed for generating accurate and detailed portraits from photographs. This tool applies advanced image processing techniques to enhance the realism and quality of the generated portraits.
  • Meta’s Diffusion Model Acceleration: Meta has released an article detailing their new approach, “Imagine Flash,” aimed at accelerating diffusion models through a technique called backward distillation. This method significantly speeds up the processing time of diffusion models while maintaining, or even enhancing, the quality of generated images.
  • Adobe Firefly v3 for Photoshop: Adobe has launched Firefly v3, a new version of their model integrated into Photoshop. This tool allows users to degenerate specific objects, alter backgrounds, and generate new images from scratch.

Spread the word