In this monthly recap, we highlight the top AI news stories from December:
Google Announced Gemini
Google announced Gemini, its most advanced AI model, capable of multimodal tasks such as understanding text, code, audio, image, and video. Gemini comes in three versions: Ultra for complex tasks, Pro for a wide range of tasks, and Nano for on-device efficiency. It outperforms human experts on language understanding benchmarks and excels in coding tasks, setting state-of-the-art performance. Google emphasizes responsibility and safety, conducting comprehensive evaluations, collaborating with external experts, and addressing challenges like bias and toxicity. Gemini is being integrated into Google products and will be available for developers through Google Cloud.
In addition to Gemini, Google introduced the Gemini API and a set of AI tools for developers and businesses. Now accessible through Vertex AI, the Gemini API comes with notable features, including the upgraded text-to-image diffusion technology Imagen 2 from Google DeepMind, healthcare-focused foundation models in MedLM, AI-assisted code development with Duet AI for Developers, and enhanced security operations with Duet AI in Security Operations.
The context behind: After ChatGPT’s launch, Google swiftly intensified its focus on accelerating AI product development. This involved the active participation of numerous employees, including the return of co-founders Larry Page and Sergey Brin to active roles. This push led to the release of Bard and Gemini this year.
FunSearch by Google DeepMind
Google DeepMind unveiled FunSearch, a method that leverages LLMs to discover solutions in mathematics and computer science. FunSearch combines a pre-trained LLM, geared towards generating creative computer code solutions, with an automated “evaluator” to prevent inaccuracies. Through iterative refinement between these components, initial solutions evolve into new knowledge. The system searches for “functions” written in computer code, thus named FunSearch.
Between the lines: Despite a bunch of enthusiastic news reports about FunSearch, Gary Marcus, a leading AI expert, pointed out that FunSearch may not be a milestone in scientific discovery as hyped by the PR team. While LLM assisted in solving a math problem, it was part of a narrow, prescribed system.
Phi-2 by Microsoft
Microsoft introduced Phi-2, a 2.7 billion-parameter language model showcasing outstanding reasoning and language understanding capabilities. Despite its compact size, Phi-2 outperforms Gemini Nano, Mistral 7B, and Llama 2 models. Its small size makes Phi-2 an ideal platform for researchers exploring mechanistic interpretability, safety enhancements, and fine-tuning across various tasks.
Why it matters:
As the industry moves forward, the future seems to belong to more compact models, like Phi-2, offering practical advantages and easier access. It’s simpler to use, allowing for additional training on specific data or straightforward integration into production. Easier access empowers a broader range of researchers, even those without million-dollar computing budgets, to utilize and innovate it.
Alexander Shironosov, Team Leader of Everypixel
Apple quietly released an open source multimodal LLM
Apple quietly released an open source multimodal LLM called Ferret in October 2023. While the initial release didn’t gain much attention, it has recently sparked interest after Bart de Witte’s post on X.
Between the lines: The unexpected entry of Apple into the open source LLM landscape surprised many in the AI community, given Apple’s traditional reputation as a “walled garden” company.
Midjourney unveiled an alpha test of its new version Midjourney V6, introducing improvements such as more realistic, detailed images and the ability to generate legible text within images. New features also include accurate prompt following, improved coherence, enhanced model knowledge, improved image prompting and remix, minor text drawing ability, and improved upscalers.
Why it matters: Midjourney V6 garnered positive feedback for its vivid, detailed results. Just have a look at the difference between Midjourney V6 and Midjourney V3 generated images from the same prompt.
Meta launched an AI-powered image generator
Meta launched a standalone AI-powered image generator called “Imagine with Meta.” Powered by Meta’s existing Emu image generation model, Imagine with Meta allows users to create high-resolution images from text prompts.
Why it matters: To increase transparency and traceability, Meta also added visible watermarks to content generated by Imagine with Meta, with future plans to incorporate invisible watermarks.
PeopleMaker in Canva
The vAIsual with its PeopleMaker technology, known for creating realistic AI-generated images of people while adhering to GDPR compliance, is now integrated into Canva. PeopleMaker, a GenAI model by vAIsual, stands out for its diverse and legally sound dataset, featuring only high-resolution in-house photos of real-life models captured from various angles and expressions. The dataset is designed to be inclusive, with ongoing efforts to represent diverse ethnic and racial backgrounds. Notably, all participants have signed a comprehensive biometric model release, ensuring legal, ethical, and transparent AI model training. Canva users can now easily incorporate PeopleMaker into their design projects.
Why it matters: The industry is gradually shifting towards leveraging licensed content for AI generators, signaling a movement towards increased transparency and legality in AI applications.
The legal aspects of AI training and content use are likely to become clearer in 2024. We may see where the existing lawsuits end up. These precedents will set to define the foundation for the legal landscape of the industry. I’m optimistic that the legal community will officially endorse a perspective that takes into account not only the interests of AI providers, but also those of the creators whose work is used in training algorithms.
Dmitry Shironosov, CEO of Everypixel
The New York Times Sues OpenAI and Microsoft
The New York Times filed a copyright infringement lawsuit against OpenAI and Microsoft, alleging that millions of its articles were used without permission to train AI models, including ChatGPT. The lawsuit claims the AI models, produced by OpenAI and Microsoft, now compete with The Times as a source of information. The complaint argues that OpenAI and Microsoft are “free-riding on The Times’s massive investment in its journalism,” accusing them of using The Time’s content without payment to create products that substitute for The Times and steal audiences away from it. The suit seeks “billions of dollars in statutory and actual damages” and demands the destruction of chatbot models and training data using copyrighted material from The Times.
The context behind: At the same time, The New York Times is actively delving into AI technology by hiring an editorial director for AI initiatives. The purpose is to create guidelines for incorporating AI into the newsroom and explore various ways to integrate this technology into the company’s journalism.
McDonald’s x Google
A non-tech brand McDonald’s marks a significant move by embracing innovation and venturing into the AI landscape. McDonald’s and Google Cloud announced a strategic global partnership to integrate Google Cloud technology across thousands of McDonald’s restaurants worldwide. The collaboration aims to enhance McDonald’s restaurant technology platform, speed up innovations, and improve customer experiences. While specific details about the application of AI are not provided, McDonald’s mentions that it will involve enhancements to its mobile app, loyalty program, and self-service kiosks.
ChatGPT is The Most-viewed Page of 2023
Wikipedia’s 2023 most-viewed pages list is led by ChatGPT, accumulating over 49.4 million views and highlighting global interest in AI innovation. The page’s popularity also coincided with ChatGPT’s rapid growth, achieving 100 million active users in January. Additionally, among the top 50 AI tools, which collectively drew over 24 billion visits from September 2022 to August 2023, ChatGPT also led the way, constituting over 60% of the analyzed traffic.
GatesNotes about 2024
Bill Gates shared his notes about 2024, drawing widespread attention and making headlines, as always. He anticipates a pivotal moment in 2024, emphasizing the transformative power of AI in accelerating innovation. Highlighting AI’s impact on drug discovery and healthcare, Gates discusses ongoing projects, including AI combatting antibiotic resistance, personalized AI tutors, AI-assisted high-risk pregnancy treatment, and AI aiding HIV risk assessments. He reflects on AI’s potential to bridge global health disparities and outlines the Gates Foundation’s commitment to major initiatives, notably addressing malnutrition through innovative probiotic interventions for infants, aiming to revolutionize child health through microbiome advancements.