Top AI News, August 2024

In this monthly roundup, we highlight the top AI news stories from August:

OpenAI, Anthropic sign deals with US government

OpenAI and Anthropic have signed landmark agreements with the U.S. government to advance research, testing, and evaluation of their AI models. These first-of-their-kind deals, announced by the U.S. Artificial Intelligence Safety Institute, aim to rigorously assess the safety and ethical implications of AI technologies before they are widely deployed. The collaboration underscores a growing focus on responsible AI development amid increasing regulatory scrutiny, with California legislators set to vote on new AI regulations.

The context behind: The agreements also involve collaboration with the U.K. AI Safety Institute, reflecting a broader effort to establish global standards for AI safety.

Wiley to Earn $44M from AI Deals

Wiley is set to earn $44 million from AI rights deals, but authors won’t have the option to opt-out of having their work used to train Large Language Models (LLMs). The academic publisher has already made $23 million from these deals and expects to earn another $21 million this financial year. While Wiley asserts that authors will be compensated according to their contractual terms, the lack of an opt-out option has sparked concerns, especially following similar controversies involving other publishers.

The context behind: Wiley has not disclosed details about the tech companies involved, citing confidentiality, but emphasized its commitment to protecting authors’ rights.

Stable Fast 3D

Stability AI has unveiled Stable Fast 3D, a model capable of generating high-quality 3D assets from a single image in 0.5 seconds. This model marks a significant leap in 3D reconstruction, offering rapid prototyping tools ideal for game developers, virtual reality creators, and professionals in retail, architecture, and design. Building on the TripoSR framework, Stable Fast 3D features enhanced architectural improvements that deliver detailed 3D assets with reduced processing times.

FLUX.1

Black Forest Labs has launched Flux, a suite of text-to-image models that might challenge industry leaders like Midjourney. The flagship model, FLUX.1 [pro], is built on a 12 billion parameter transformer architecture and features innovative techniques like hybrid multimodal and parallel diffusion blocks, rotary positional embeddings, and flow matching. These advances promise superior image detail, prompt adherence, and output diversity.

Between the lines: While Flux is positioned as a powerful alternative with open-source accessibility, the real question remains whether it can truly surpass Midjourney’s established dominance in artistic flair and user experience.

Ideogram 2

Brain Titan has launched Ideogram 2.0, a text-to-image model now available to all users. Ideogram 2.0 boasts enhanced image quality and text rendering across five distinct styles: Generic, Realistic, Design, 3D, and Anime. It outperforms competitors like Flux Pro and DALL·E 3, especially in realism and text accuracy. The update includes an iOS app, a public API beta, and a vast searchable library of over one billion images, making it a good tool for creators in fields ranging from graphic design to e-commerce.

The context behindExactly a year ago, Brain Titan made a successful debut with the original Ideogram, the first social network for AI-generated art, which rapidly attracted over 90,000 users and showcased its innovative text-to-image capabilities.

Google’s AI chatbot

Google’s Gemini AI chatbot, which allows users to query their Gmail inbox, is now rolling out to Android, with iOS support coming soon. The Gmail Q&A feature, introduced at Google I/O, lets users ask Gemini to find specific emails, display unread messages, or summarize topics in their inbox.

Between the lines: Users need to subscribe to Google One AI Premium or have a Google Workspace plan with specific add-ons. As with all AI tools, users are advised to verify the accuracy of the chatbot’s responses.

OpenAI’s Tool for Detecting AI-Written Text

OpenAI has developed a tool capable of detecting AI-generated text, such as essays written by ChatGPT, with 99.9% accuracy. Despite growing concerns over students using AI to cheat, the tool has been held back due to internal debates within the company.

Between the lines: According to sources and internal documents, the technology has been ready for release for about a year, but OpenAI has yet to make it available, with one insider noting that “it’s just a matter of pressing a button.”

Our colleague and the RnD Team Lead of Everypixel Alexander Shironosov, has deepened our exploration of recent releases in AI models:

Vision models:

  • Sapiens: Meta Reality Labs has unveiled Sapiens, a suite of vision models designed for advanced human-centric image processing tasks, including 2D pose estimation, body-part segmentation, depth estimation, and surface normal prediction. Trained on an extensive dataset of 300 million images featuring people, these models achieve state-of-the-art accuracy and offer high-resolution 1K inference. With a scalable architecture ranging from 0.3 to 2 billion parameters, Sapiens consistently outperforms existing benchmarks in human-centric vision tasks.

Video Generation:

  • Cogvideox: The Tsinghua University Department of Computer Science and Technology (THUDM) has released the CogVideoX-5b, a video generation model with 2 and 5 billion parameters, delivering impressive results for its relatively small size. This open-source model excels in creating high-quality, captioned videos from text prompts, generating vivid and coherent scenes.
  • Minimax video-01: China’s latest AI video generator, MiniMax video-01, is quickly gaining attention for its ability to produce hyper-realistic footage, especially when it comes to accurately rendering human movements—an area where many similar tools struggle. While it competes with other top models like Runway Gen-3 and Kling, its unique strength lies in capturing realistic human actions, making it a promising contender in the evolving landscape of AI-generated video content.

Open Source Models

  • Phi-3.5: Microsoft’s latest Phi-3.5 model family introduces powerful updates to its Small Language Models (SLMs), offering enhanced performance and multi-lingual support while remaining cost-effective. The new Phi-3.5-MoE model stands out with its Mixture-of-Experts architecture, utilizing 16 experts and 6.6B active parameters to outperform larger models in language understanding and reasoning across 20+ languages. Additionally, Phi-3.5-mini boasts a 128K context length and excels in multi-lingual tasks, making it ideal for long-context applications. Meanwhile, Phi-3.5-vision pushes the boundaries of multi-frame image understanding, significantly improving performance in single-image benchmarks.
  • Jamba 1.5 Mini and Jamba 1.5 Large: These models, built on a hybrid architecture that combines the strengths of Transformer and Mamba architectures, offer unmatched speed, efficiency, and performance in long-context language models. The Jamba 1.5 Large, a Mixture-of-Experts (MoE) model with 398B total parameters and 94B active parameters, is designed to handle complex reasoning tasks with high quality and efficiency. Both models utilize a true context window of 256K tokens, the largest currently available under an open license, and have demonstrated superior performance in latency tests against similar models.

VLM

  • Qwen2-vl: The new Qwen2-vl models introduce cutting-edge innovations in multi-modal AI, featuring 2B, 7B, and a powerful 72B parameter variant. Leveraging Naive Dynamic Resolution, these models process images at arbitrary resolutions, minimizing visual information loss, while Multimodal Rotary Position Embedding (M-ROPE) enables seamless integration of text, images, and videos. Notably, the 72B version surpasses GPT-4 and Claude 3.5 Sonnet across multiple benchmarks, showcasing its superior capabilities. API access to the 72B model offers unparalleled performance for advanced AI applications.

Miscellaneous

AI-Generated Ads with Tom Hanks

Tom Hanks recently alerted his followers to fraudulent ads circulating online that use his likeness without consent, generated through AI technology. In an Instagram post, the actor emphasized that these ads, which falsely promote miracle cures, have been created without his permission. This incident is part of a growing trend of AI-generated ads featuring unauthorized celebrity endorsements, affecting other high-profile figures like Elon Musk, Taylor Swift, and Scarlett Johansson.

Spread the word

Posted

in

by