Top AI News, September 2024

In this monthly roundup, we highlight the top AI news stories from September:

OpenAI o1

OpenAI has introduced o1, a large language model trained with reinforcement learning to tackle complex reasoning tasks. Unlike typical LLMs, o1 generates an internal chain of thought before responding, allowing it to “think” before delivering an answer. The model ranks in the 89th percentile on competitive programming challenges and exceeds human accuracy on PhD-level science benchmarks.

Between the lines: While OpenAI is still refining o1 for ease of use, an early preview is now available for ChatGPT and select API users.

OpenAI’s Advanced Voice

OpenAI has begun rolling out Advanced Voice Mode (AVM) to ChatGPT Plus and Teams users, offering a more natural speaking experience. The feature includes five new nature-inspired voices—Arbor, Maple, Sol, Spruce, and Vale—bringing the total to nine. A redesigned blue animated sphere now represents AVM, replacing the old black dots. Enhancements also include improved accent recognition and smoother conversations. AVM users will also benefit from Custom Instructions and Memory, enabling more personalized interactions.

Between the lines: However, multimodal features like video and screen sharing are still pending release.

OpenAI CTO Mira Murati Steps Down

OpenAI CTO Mira Murati has announced her departure after six years, citing a desire for “time and space to do my own exploration.” Her exit follows the recent launch of key AI tools like ChatGPT and DALL-E, and marks a major leadership change as two other executives, Bob McGrew (Chief Research Officer) and Barret Zoph (VP of Post-Training), are also stepping down. CEO Sam Altman noted that the decisions were made independently but emphasized that these transitions come as OpenAI is on an upswing, ahead of its Dev Day conference.

Google Unveils Gems and Imagen 3

Google is rolling out two significant updates to its Gemini Advanced platform: Gems and Imagen 3. Gems allows users to create personalized versions of Gemini that act as experts on various topics, streamlining tasks through customizable instructions. Users can set parameters like tone and response length, and pre-made Gems will assist with everything from career planning to writing and coding support. Meanwhile, Imagen 3 introduces advanced image generation capabilities, enabling users to create stunning visuals, including images of people, albeit initially limited to Gemini Advanced, Business, and Enterprise subscribers. This update aims to provide a better user experience by addressing prior issues with people generation, ensuring compliance with safety guidelines.

Alibaba Launches Qwen2-VL

Alibaba Cloud has introduced Qwen2-VL, an advanced vision-language model capable of analyzing videos longer than 20 minutes, setting a new benchmark for AI interaction with visual data. The model excels in recognizing handwriting, distinguishing objects, and summarizing video content, even offering near-real-time analysis for live tech support scenarios. Qwen2-VL outperforms established models like Meta’s Llama 3.1 and OpenAI’s GPT-4o in third-party tests, showcasing its potential in applications ranging from automated customer service to complex decision-making tasks. Available in three variants, including two fully open-source models, Qwen2-VL is designed for integration with mobile devices and robots, utilizing architectural innovations like Naive Dynamic Resolution and Multimodal Rotary Position Embedding to enhance visual comprehension.

Meta Unveils Llama 3.2

Meta has launched Llama 3.2, featuring a suite of lightweight vision and text-only large language models (LLMs) designed for edge and mobile devices. The new models, available in sizes of 1B, 3B, 11B, and 90B, support an impressive context length of 128K tokens, making them ideal for local tasks like summarization and instruction following. Notably, the 11B and 90B vision models outperform their text equivalents in image understanding and can be easily fine-tuned for custom applications. Llama 3.2 also introduces the Llama Stack, simplifying deployment across various environments and fostering collaboration with major partners such as AWS, Google Cloud, and Qualcomm.

Pixtral 12B

Mistral has launched its first multimodal model, Pixtral 12B, which can process both images and text. Built on Mistral’s text model Nemo 12B, Pixtral 12B features 12 billion parameters, allowing it to handle tasks like image captioning and object counting from arbitrary images or URLs. The model, about 24GB in size, is available for download and fine-tuning under an Apache 2.0 license via platforms like GitHub and Hugging Face.

Between the lines: Mistral’s move into the multimodal space places it alongside competitors like OpenAI and Anthropic, who offer similar models. Despite some uncertainty about the dataset used for training, Mistral continues to push boundaries with free, open-access models. This release follows the company’s recent $645 million funding round, valuing it at $6 billion, as it aims to challenge industry leaders with both open and managed AI solutions.

Qwen2.5-Math: The New Benchmark for Open-Source Mathematical LLMs

Qwen2.5-Math, the latest iteration of the Qwen mathematical language models, offers groundbreaking improvements for solving complex math problems in both English and Chinese. The series, which includes models ranging from 1.5B to 72B parameters, integrates Chain-of-Thought (CoT) and Tool-Integrated Reasoning (TIR) techniques, enabling higher computational accuracy and deeper algorithmic understanding. Compared to its predecessor, Qwen2-Math, the new series demonstrates significant advancements, particularly with its flagship Qwen2.5-Math-72B-Instruct model outperforming both open-source and leading closed-source models on challenging benchmarks like MATH and AIME.

Between the lines: By leveraging synthesized data and reinforcement learning guided by a reward model, Qwen2.5-Math sets a new standard for AI-driven mathematical reasoning.

AlphaProteo

Google DeepMind has unveiled AlphaProteo, a cutting-edge AI system that designs high-strength protein binders for biological and health research. Unlike traditional methods, AlphaProteo can generate novel proteins that successfully bind to target molecules, significantly enhancing drug development, disease understanding, and more. Notably, it has achieved binding success rates of up to 88% for viral proteins and outperforms existing design methods by up to 300 times in binding affinity for seven tested targets, including the SARS-CoV-2 spike protein and VEGF-A, which is linked to cancer. Trained on extensive protein data, AlphaProteo streamlines the time-intensive protein design process, marking a significant advancement in the field and paving the way for more efficient biological research.

Between the lines: As the system continues to evolve, it promises to tackle complex challenges in drug design and other applications while adhering to rigorous safety and ethical standards.

Lionsgate x Runway

Lionsgate has struck a deal with AI startup Runway to use generative AI technology as a tool for filmmakers, aiming to save “millions” in production costs. Runway will develop a custom AI model based on Lionsgate’s extensive film and TV library, enabling filmmakers to generate and enhance cinematic video for pre- and post-production.

Between the lines: Runway has also unveiled The Hundred Film Fund, offering up to $1 million in grants for filmmakers using generative AI. The fund aims to produce 100 AI-powered short films and feature-length projects, with grants ranging from $5,000 to $1 million. Runway’s CEO, Cris Valenzuela, emphasized that the initiative is purely about promoting AI as a creative tool, with no ownership or distribution rights retained by Runway.

OpenAI Co-Founder Launches Safety-Focused AI Startup SSI

Safe Superintelligence (SSI), co-founded by OpenAI’s former chief scientist Ilya Sutskever, has successfully raised $1 billion to advance the development of safe AI systems aimed at surpassing human capabilities. SSI’s mission is to create secure AI solutions amid growing concerns about AI safety.

Dejaview Predicts Crimes Before They Happen

South Korea’s Electronics and Telecommunications Research Institute has unveiled Dejaview, an AI system designed to predict crimes before they occur through real-time CCTV analysis. By assessing factors like location, time of day, and historical crime data, Dejaview can map high-risk zones and signal when individuals may be on the verge of reoffending, boasting an 82.8% accuracy rate in initial trials.

Between the lines: While the system holds promise for enhancing public safety, its Orwellian implications raise concerns about privacy and surveillance

Spread the word

Posted

in

by