In this monthly roundup, we spotlight the top AI news stories from March:
Content Credentials by BBC
BBC News has rolled out a ‘content credentials’ feature, aiming at countering disinformation by confirming the origin and authenticity of images and videos. This technology embeds verification information directly into the media, ensuring its integrity even when shared outside the BBC. Users can now access verification details by clicking on a button labeled ‘how we verified this,’ revealing the processes employed to confirm authenticity. Additionally, BBC News introduces a technical standard called ‘Content Credentials,’ developed in collaboration with the Coalition for Content Provenance and Authenticity (C2PA), offering a traceable history of media origins and edits.
Between the lines: C2PA is a Joint Development Foundation project formed by Adobe, Arm, Intel, Microsoft, and Truepic, aimed at ensuring the credibility and origin of digital content. BBC’s recent move to introduce a ‘content credentials’ feature also represents a significant step in countering fake news and misinformation. This initiative aligns with broader efforts within the tech community to establish the provenance of AI-generated content, as exemplified by OpenAI’s integration of C2PA watermarks into its DALL-E 3 image generator.
NVIDIA Copyright Infringement
NVIDIA is facing a lawsuit over alleged AI copyright infringement. Authors Abdi Nazemian, Brian Keene, and Stewart O’Nan have filed a proposed class action against the company, accusing it of training its NeMo AI on a dataset containing their books without permission. NeMo, NVIDIA’s AI platform, facilitates the creation and training of chatbots. The authors claim it was trained on a contentious dataset known as Books3, which reportedly includes pirated copies of their works. The plaintiffs demand damages and the destruction of all copies of the dataset used. NVIDIA asserts that NeMo was developed in compliance with copyright law, emphasizing its respect for content creators’ rights.
Between the lines: This lawsuit adds to a string of copyright infringement cases involving prominent tech companies and content creators, underscoring the complexity of legal issues arising from AI technologies.
OpenAI and Elon Musk
Elon Musk has filed a lawsuit against OpenAI and its CEO, Sam Altman, alleging a departure from the organization’s original mission of benefiting humanity. Musk claims that OpenAI’s investment deal with Microsoft has transformed the company into a profit-driven entity, focusing on developing AGI for Microsoft’s gain rather than for the betterment of humanity. The lawsuit accuses OpenAI of releasing GPT-4, a powerful model, in secrecy, diverging from its principles of openness and transparency.
In the same vein, OpenAI has filed papers seeking dismissal of all of Elon’s claims and shared some facts about their relationships with Elon. Despite the challenges encountered, OpenAI expresses sadness over the discord with someone they deeply admired, emphasizing Elon’s pivotal role in inspiring them to aim higher. Reflecting on the journey, OpenAI emphasizes its core mission to ensure that the benefits of AGI are accessible to all humanity. Despite Elon’s departure and divergent views on for-profit structures, OpenAI still remains steadfast in advancing its mission.
The context behind: Elon Musk was one of the co-founders of OpenAI, which was established in December 2015. Initially, OpenAI started as a non-profit organization dedicated to advancing and promoting AI for the benefit of humanity. Despite being a co-founder, Musk stepped down from the board in 2018 but remained a donor to the organization. His decision to resign from the board was to avoid any potential conflicts of interest with his role as CEO of Tesla, especially as Tesla increased its focus on AI for self-driving cars. Over the years, OpenAI transitioned into a ‘capped’ for-profit entity, OpenAI LP to attract the capital necessary for advancing its research while maintaining a focus on safe and beneficial AI development.
OpenAI Deal with Publishers
OpenAI has secured licensing agreements with two prominent European publishers, Le Monde and Prisa. These agreements mark a significant step forward in OpenAI’s strategy to collaborate with media organizations rather than engage in contentious disputes over content usage in its AI models. The deals will enable French and Spanish language news content from Le Monde and Prisa to be integrated into ChatGPT and contribute to training OpenAI’s models.
Behind the scenes: As the court deliberates on the case between OpenAI and The New York Times, centered on alleged copyright infringement, OpenAI has taken proactive steps by engaging in collaborative efforts with media outlets.
Leadership Transition at Stability AI: Emad Mostaque Steps Down as CEO
Emad Mostaque has stepped down as CEO of Stability AI to pursue decentralized AI, with Shan Shan Wong and Christian Laforte appointed as interim co-CEOs. The board is searching for a permanent CEO. Mostaque expressed pride in the company’s achievements and emphasized the importance of open AI. This change marks an opportunity for Stability AI to continue its growth while maintaining its commitment to open-source AI.
Between the lines: Emad Mostaque’s departure from Stability AI resonates with the recent leadership transition at OpenAI. Both instances highlight the complexities of maintaining alignment between company vision and leadership decisions.
Our colleague and Team Leader of Everypixel Alexander Shironosov, has deepened our exploration of recent releases in AI models:
LLM:
Grok: xAI open-sourced its AI chatbot, Grok, and it is now available on GitHub and Hugging Face. This decision offers researchers and developers the opportunity to delve into the model, potentially shaping the future trajectory of xAI’s Grok amidst stiff competition from industry giants like OpenAI, Meta, Google, and Microsoft.
DBRX: AI company Databricks introduced DBRX, an open, general-purpose LLM, setting a new state-of-the-art performance for established open LLMs across various standard benchmarks. Surpassing models like GPT-3.5 and competing with Gemini 1.0 Pro, DBRX offers capabilities previously limited to closed model APIs, making it a significant advancement in AI development. Notably, DBRX excels in efficiency, boasting faster inference speeds and a smaller size compared to other models like Grok-1.
Jamba: AI21 has introduced Jamba, a SSM-Transformer hybrid model that combines Mamba Structured State Space (SSM) technology with elements of the traditional Transformer architecture, boasting a wide 256K context window to overcome the limitations of pure SSM models. This innovation marks a milestone in LLM development, optimizing memory, throughput, and performance concurrently. Featuring MoE layers and a hybrid structure, Jamba enables efficient deployment and experimentation with long contexts on a single GPU.
Claude 3: Anthropic unveiled Claude 3, consisting of three language models: Claude 3 Haiku, Claude 3 Sonnet, and Claude 3 Opus. The company asserts that these models approach ‘near-human’ capability across various cognitive tasks and outperform GPT-4 on 10 AI benchmarks, including MMLU (undergraduate level knowledge), GSM8K (grade school math), and HumanEval (coding).
Between the lines: Claude 3 Opus has emerged as the new leader in the chatbot arena, displacing GT4. This change comes amid efforts by Anthropic to match OpenAI’s models in performance. However, opinions within the AI community are divided, with some experts advocating for a more comprehensive evaluation approach that goes beyond performance metrics.
Text-to-Image:
Stable diffusion 3 Research Paper: StabilityAI published a research paper, showcasing the transformative potential of transformers in architecture. Powered by the innovative Multimodal Diffusion Transformer (MMDiT) architecture, SD3 surpasses leading models like DALL·E 3 and Midjourney v6 in typography and prompt adherence, as confirmed by human evaluations. By introducing separate sets of weights for image and language representations, SD3 enhances text understanding and spelling capabilities, promising to revolutionize text-to-image synthesis.
Stable diffusion 3-turbo Research Paper: Notably, instead of using a DINOv2 discriminator on RGB pixels, SD3-Turbo employs a latent space discriminator for faster processing and reduced memory usage. The discriminator, acting generatively, examines intermediate features alongside the final results, enhancing the training signal. Training on images with various aspect ratios and employing more noise during sampling to improve global object structure comprehension are among the novel strategies utilized.
Pixart-Sigma: Huawei has unveiled Pixart-Sigma, a Diffusion Transformer model capable of directly generating images at 4K resolution. Trained on images up to 4K in size with long captions and utilizing a ‘weak-to-strong training’ approach, PixArt boasts higher fidelity images and improved alignment with text prompts. Despite its smaller size (0.6B parameters) compared to existing models like SDXL and SD Cascade, PixArt-Σ delivers good image quality and prompt adherence.
Multimodal LLM / VLM:
Apple: Apple has published an article, showcasing the influence of various architectural components and data on the performance of MLL (Multimodal Large Language) models. This study introduces a family of MM1 models, with the largest boasting an impressive 80 billion parameters, outstripping existing models in terms of metrics despite utilizing only 30 billion. Additionally, on certain benchmarks like VQA, the MM1 models even outperform GPT-4v/Gemini ultra. However, there’s a notable absence of available code or models at present, leaving uncertainty regarding their future release.
MiniGemini: MiniGemini pioneers an approach by leveraging both information from entire images and specific details, bolstering image comprehension, reasoning, and generation capabilities within VLM frameworks.
MoAI: LLVM that harnesses the power of additional information extracted from OCR (Optical Character Recognition) detectors and segmentation models to revolutionize visual perception tasks. Unlike existing LLVMs that primarily rely on their large capacity, MoAI diverges by integrating detailed real-world scene understanding from specialized computer vision models. By incorporating outputs from external CV models, MoAI achieves remarkable performance in zero-shot VL tasks without the need for model enlargement or bespoke datasets, particularly excelling in tasks related to object detection, relations, and OCR.
Miscellaneous
Kate Middleton Photo Scandal
A family photo of the Princess of Wales, released by Kensington Palace, has sparked controversy amid concerns of manipulation, prompting photo agencies like Getty Images and Reuters to withdraw it from circulation. Despite being intended as a reintroduction of Kate Middleton after her recent hospitalization, the image has fueled skepticism, with observers noting inconsistencies, such as Princess Charlotte’s hand alignment and the absence of Kate’s wedding ring. The palace’s attempt to clarify the situation by suggesting amateur photo editing only underscores the complexities of discerning real from manipulated content, exacerbated by the proliferation of AI-generated imagery. Following this, when the Princess later released a video disclosing her battle with cancer, it faced similar scrutiny.
Why it matters: The AI boom has led to increased public distrust of content, paving the way for various conspiracy theories to flourish, with even figures like Kate Middleton becoming the subject of memes. The recent viral photo of the Pope served as a catalyst, shocking many and highlighting the potential for manipulated imagery to spread rapidly. As a result, such instances are likely to continue making headlines and fueling gossip in the future.