Top AI News, August 2024

In this monthly roundup, we highlight the top AI news stories from August:

OpenAI, Anthropic sign deals with US government

OpenAI and Anthropic have signed landmark agreements with the U.S. government to advance research, testing, and evaluation of their AI models. These first-of-their-kind deals, announced by the U.S. Artificial Intelligence Safety Institute, aim to rigorously assess the safety and ethical implications of AI technologies before they are widely deployed. The collaboration underscores a growing focus on responsible AI development amid increasing regulatory scrutiny, with California legislators set to vote on new AI regulations.

The context behind: The agreements also involve collaboration with the U.K. AI Safety Institute, reflecting a broader effort to establish global standards for AI safety.

Wiley to Earn $44M from AI Deals

Wiley is set to earn $44 million from AI rights deals, but authors won’t have the option to opt-out of having their work used to train Large Language Models (LLMs). The academic publisher has already made $23 million from these deals and expects to earn another $21 million this financial year. While Wiley asserts that authors will be compensated according to their contractual terms, the lack of an opt-out option has sparked concerns, especially following similar controversies involving other publishers.

The context behind: Wiley has not disclosed details about the tech companies involved, citing confidentiality, but emphasized its commitment to protecting authors’ rights.

Stable Fast 3D

Stability AI has unveiled Stable Fast 3D, a model capable of generating high-quality 3D assets from a single image in 0.5 seconds. This model marks a significant leap in 3D reconstruction, offering rapid prototyping tools ideal for game developers, virtual reality creators, and professionals in retail, architecture, and design. Building on the TripoSR framework, Stable Fast 3D features enhanced architectural improvements that deliver detailed 3D assets with reduced processing times.

We are excited to introduce Stable Fast 3D, Stability AI’s latest breakthrough in 3D asset generation technology. This innovative model transforms a single input image into a detailed 3D asset in just 0.5 seconds, setting a new standard for speed and quality in the field of 3D… pic.twitter.com/DLV7fLpjB3
— Stability AI (@StabilityAI) August 1, 2024

FLUX.1

Black Forest Labs has launched Flux, a suite of text-to-image models that might challenge industry leaders like Midjourney. The flagship model, FLUX.1 [pro], is built on a 12 billion parameter transformer architecture and features innovative techniques like hybrid multimodal and parallel diffusion blocks, rotary positional embeddings, and flow matching. These advances promise superior image detail, prompt adherence, and output diversity.

Paradisex cat with big wings, hyper realistic, ray tracing, realistic body features and face, 4k, 8k, cgsociety, hyper realistic, sharpness, dramatic lightning #AIart #flux1 pic.twitter.com/v2dUVwAI8G
— AI art.random (@ai_art_random) August 24, 2024

Between the lines: While Flux is positioned as a powerful alternative with open-source accessibility, the real question remains whether it can truly surpass Midjourney’s established dominance in artistic flair and user experience.

Ideogram 2

Brain Titan has launched Ideogram 2.0, a text-to-image model now available to all users. Ideogram 2.0 boasts enhanced image quality and text rendering across five distinct styles: Generic, Realistic, Design, 3D, and Anime. It outperforms competitors like Flux Pro and DALL·E 3, especially in realism and text accuracy. The update includes an iOS app, a public API beta, and a vast searchable library of over one billion images, making it a good tool for creators in fields ranging from graphic design to e-commerce.

We launched Ideogram 2.0 last week.

Here are 16 creative ways our community has been using it… 🧵 #ideogram2 pic.twitter.com/HeNaN2mDro
— Ideogram (@ideogram_ai) August 29, 2024

The context behind: Exactly a year ago, Brain Titan made a successful debut with the original Ideogram, the first social network for AI-generated art, which rapidly attracted over 90,000 users and showcased its innovative text-to-image capabilities.

Google’s AI chatbot

Google’s Gemini AI chatbot, which allows users to query their Gmail inbox, is now rolling out to Android, with iOS support coming soon. The Gmail Q&A feature, introduced at Google I/O, lets users ask Gemini to find specific emails, display unread messages, or summarize topics in their inbox.

Between the lines: Users need to subscribe to Google One AI Premium or have a Google Workspace plan with specific add-ons. As with all AI tools, users are advised to verify the accuracy of the chatbot’s responses.

OpenAI’s Tool for Detecting AI-Written Text

OpenAI has developed a tool capable of detecting AI-generated text, such as essays written by ChatGPT, with 99.9% accuracy. Despite growing concerns over students using AI to cheat, the tool has been held back due to internal debates within the company.

Between the lines: According to sources and internal documents, the technology has been ready for release for about a year, but OpenAI has yet to make it available, with one insider noting that “it’s just a matter of pressing a button.”

Our colleague and the RnD Team Lead of Everypixel Alexander Shironosov, has deepened our exploration of recent releases in AI models:

Vision models:

Sapiens: Meta Reality Labs has unveiled Sapiens, a suite of vision models designed for advanced human-centric image processing tasks, including 2D pose estimation, body-part segmentation, depth estimation, and surface normal prediction. Trained on an extensive dataset of 300 million images featuring people, these models achieve state-of-the-art accuracy and offer high-resolution 1K inference. With a scalable architecture ranging from 0.3 to 2 billion parameters, Sapiens consistently outperforms existing benchmarks in human-centric vision tasks.

Checkout @Meta's Sapiens: Body-Part Segmentation in a @huggingface live space

Sapiens, foundation models for human tasks pretrained on 300 million human images.

You can do Body-Part Segmentation, Depth Estimation etc pic.twitter.com/lLPCfE2Ky9
— Rohan Paul (@rohanpaul_ai) August 31, 2024

Video Generation:

Cogvideox: The Tsinghua University Department of Computer Science and Technology (THUDM) has released the CogVideoX-5b, a video generation model with 2 and 5 billion parameters, delivering impressive results for its relatively small size. This open-source model excels in creating high-quality, captioned videos from text prompts, generating vivid and coherent scenes.

@ChatGLM

So Far so good from
CogVideox-5b@ChatGLM pic.twitter.com/YnjDJjpovD
— Thee Unknown (@ThemTraive38109) August 31, 2024

Minimax video-01: China’s latest AI video generator, MiniMax video-01, is quickly gaining attention for its ability to produce hyper-realistic footage, especially when it comes to accurately rendering human movements—an area where many similar tools struggle. While it competes with other top models like Runway Gen-3 and Kling, its unique strength lies in capturing realistic human actions, making it a promising contender in the evolving landscape of AI-generated video content.

🚨 BREAKING NEWS 🚨
A Chinese video AI just dropped, and it’s already shaking up the video world.

Meet #MiniMax, the AI that’s set to revolutionize the game and claim its place in the video AI arena.

Check out this 2-minute official demo, created entirely in Text to Video… pic.twitter.com/qldF7M2akS
— Pierrick Chevallier | IA (@CharaspowerAI) September 1, 2024

Open Source Models

Phi-3.5: Microsoft’s latest Phi-3.5 model family introduces powerful updates to its Small Language Models (SLMs), offering enhanced performance and multi-lingual support while remaining cost-effective. The new Phi-3.5-MoE model stands out with its Mixture-of-Experts architecture, utilizing 16 experts and 6.6B active parameters to outperform larger models in language understanding and reasoning across 20+ languages. Additionally, Phi-3.5-mini boasts a 128K context length and excels in multi-lingual tasks, making it ideal for long-context applications. Meanwhile, Phi-3.5-vision pushes the boundaries of multi-frame image understanding, significantly improving performance in single-image benchmarks.

It's truly an era of mini but powerful models.

Microsoft has introduced a trio of new open-source models in their Phi-3.5 series: Phi-3.5 Mini, MoE, and Vision. These models are making waves by outperforming some larger counterparts from Meta, Mistral, and Google in critical… pic.twitter.com/mJ7Xf8jseY
— DSN – Data Science Nigeria (@dsn_ai_network) August 29, 2024

Jamba 1.5 Mini and Jamba 1.5 Large: These models, built on a hybrid architecture that combines the strengths of Transformer and Mamba architectures, offer unmatched speed, efficiency, and performance in long-context language models. The Jamba 1.5 Large, a Mixture-of-Experts (MoE) model with 398B total parameters and 94B active parameters, is designed to handle complex reasoning tasks with high quality and efficiency. Both models utilize a true context window of 256K tokens, the largest currently available under an open license, and have demonstrated superior performance in latency tests against similar models.

The most important result in the Jamba paper is that:

1. It outperforms vanilla transformer of the same size.
2. No difference between 1:7 and 1:3 ratios of mamba layers to attention layers.

Meaning:
Given an architecture, you can replace ~84% of it's layers to mamba.

Huge. https://t.co/jXASlnum3V pic.twitter.com/bx2xeYhrMp
— Yam Peleg (@Yampeleg) April 1, 2024

VLM

Qwen2-vl: The new Qwen2-vl models introduce cutting-edge innovations in multi-modal AI, featuring 2B, 7B, and a powerful 72B parameter variant. Leveraging Naive Dynamic Resolution, these models process images at arbitrary resolutions, minimizing visual information loss, while Multimodal Rotary Position Embedding (M-ROPE) enables seamless integration of text, images, and videos. Notably, the 72B version surpasses GPT-4 and Claude 3.5 Sonnet across multiple benchmarks, showcasing its superior capabilities. API access to the 72B model offers unparalleled performance for advanced AI applications.

Today we are thriiled to announce the release of Qwen2-VL! Specifically, we opensource Qwen2-Vl-2B and Qwen2-VL-7B under Apache 2.0 license, and we provide the API of our strongest Qwen2-VL-72B! To learn more about the models, feel free to visit our:

Blog:… pic.twitter.com/aBIDeQtWZY
— Qwen (@Alibaba_Qwen) August 29, 2024

Miscellaneous

AI-Generated Ads with Tom Hanks

Tom Hanks recently alerted his followers to fraudulent ads circulating online that use his likeness without consent, generated through AI technology. In an Instagram post, the actor emphasized that these ads, which falsely promote miracle cures, have been created without his permission. This incident is part of a growing trend of AI-generated ads featuring unauthorized celebrity endorsements, affecting other high-profile figures like Elon Musk, Taylor Swift, and Scarlett Johansson.

Spread the word