In this monthly roundup, we spotlight the top AI news stories from January:
OpenAI vs. The New York Times Lawsuit
Amidst The New York Times’ copyright infringement lawsuit, OpenAI responded, dismissing the claims as meritless. A copyright infringement lawsuit, filed by The New York Times against OpenAI and Microsoft, alleged that millions of its articles were used without permission to train AI models, including ChatGPT.
OpenAI, affirming their commitment to journalism support, outlined four key points in their defense: collaboration with news organizations to create mutually beneficial opportunities, the fair use principle in training AI models with an opt-out option for publishers, active efforts to minimize the rare “regurgitation” bug.
Within the context of the lawsuit, The New York Times provides examples of how ChatGPT copies its articles:
However, OpenAI counters that the regurgitations induced by The New York Times appear to be from years-old articles widely available on various third-party websites. They suggest that The New York Times intentionally manipulated prompts, instructed the model to regurgitate, or cherry-picked their examples from many attempts in order to get the model to regurgitate.
Between the lines: While engaged in a legal dispute with OpenAI, The New York Times is actively building a team to explore AI in its newsroom. Having recently appointed a head for AI initiatives, the company is currently recruiting engineers and editors. Emphasizing the coexistence of AI tools and human journalism, The New York Times plans to prototype uses of GenAI and other machine-learning techniques to help with reporting.
In the midst of OpenAI’s legal tussle with The New York Times, some details about OpenAI’s news publisher deals have emerged. The company has been trying to get more news organizations to sign licensing deals to train AI models, offering between $1 million and $5 million annually. This is one of the first signs of how much AI companies plan to pay for licensed materials.
Open AI’s Releases
OpenAI launched the GPT Store, a platform designed to help users find or build custom versions of ChatGPT. The store, available to ChatGPT Plus, Team, and Enterprise users, showcases a variety of GPTs, including categories like DALL-E, writing, research, programming, education, and lifestyle. OpenAI encourages users to contribute to the store, offering a GPT builder revenue program set to launch in Q1, where builders can earn based on GPT usage.
OpenAI also unveiled new embedding models, text-embedding-3-small and text-embedding-3-large, offering improved performance and a 5X reduction in pricing compared to the previous model. Updates also include enhancements to GPT-4 Turbo and GPT-3.5 Turbo models, a new 16k context version, and improved text moderation capabilities. Apart from that, OpenAI introduced new API usage management tools and slashed the price of GPT-3.5 Turbo by 25%, aiming to enhance accessibility for developers.
Between the lines: Since the announcement of GPTs two months ago, users have already generated over 3 million custom versions.
Copilot Key
Microsoft is set to introduce a new Copilot key for AI-powered Windows PCs, representing a significant shift in PC keyboard design—the first in nearly three decades. The Copilot key, integrated alongside the Windows key, aims to make AI seamlessly woven into Windows, marking a step towards a more personalized and intelligent computing future.
The context behind: The move follows Microsoft’s earlier announcement in September 2023 that a new version of Windows 11 would incorporate the AI companion Copilot into the operating system, enhancing its integration and capabilities. Microsoft showcases a robust commitment to integrating new technologies into its products, as exemplified by Copilot key or Bing. Additionally, Microsoft stands out for its rapid integration of products into other ecosystems, as evidenced by the announcement of Microsoft 365 apps for Apple Vision Pro. Microsoft’s proactive approach sharply contrasts with other major services like YouTube and Netflix, which won’t have dedicated apps on the Vision Pro.
Kin.art
A new tool Kin.art aims to protect artists’ portfolios from AI scraping by offering a comprehensive defense mechanism.
The method incorporates image segmentation and label fuzzing. Image segmentation involves breaking down the artist’s image into smaller pieces and scrambling them, making it challenging for AI algorithms to scrape and learn. Label fuzzing disrupts the associated metadata and text, further hindering accurate learning by AI training algorithms.
Unlike other tools, Kin.art claims to be faster, taking only milliseconds to apply its defense. The platform is free for artists to use, with plans to generate revenue by attaching a low fee to artworks sold or monetized on its platform.
Why it matters: The ongoing battle between content owners and AI companies is intensifying, prompting authors to take protective measures to protect their content.
China Accelerates AI Approval
China has approved over 40 AI models for public use in the past six months, as part of its efforts to catch up with the U.S. in AI development. In the latest round, regulators granted approval to 14 large language models, including recipients like Xiaomi Corp, 4Paradigm, and 01.AI. Beijing initiated the approval process last August, requiring tech companies to seek regulatory approval for their LLMs before making them public.
Why it matters: The move reflects China’s approach to AI development, while aiming to keep it under its purview and control.
Mark Zuckerberg’s AGI Plan
Mark Zuckerberg, CEO of Meta, has declared his intent to pursue Artificial General Intelligence (AGI), joining the race alongside OpenAI and Google. While lacking a clear timeline or definition for AGI, Zuckerberg aims to integrate it into Meta’s products. In a strategic move, Meta’s AI research group, FAIR, responsible for GenAI products, is being aligned with the company’s broader AI efforts.
Between the lines: Interestingly, this AGI push comes just two years after rebranding and Meta’s shift in focus to the metaverse. Although the metaverse hasn’t taken off as expected, Zuckerberg refrains from calling the AGI plan a pivot, emphasizing its alignment with Meta’s long-term strategy.
Releases in Multimodal Language Models / Visual Language Models
Our colleague and Team Leader of Everypixel Alexander Shironosov, have enriched our exploration of recent releases in multimodal language models:
Heavy Models:
Qwen-VL: the largest scale vision language model developed by Alibaba Cloud’s was granted a demo access. The model outperforms GPT-4V and Gemini on several benchmarks. Recently, the Qwen-VL series has undergone a significant upgrade with the launch of two enhanced versions, Qwen-VL-Plus and Qwen-VL-Max. Upgrades include improved image-related reasoning, enhanced analysis of image details, and support for high-definition images.
Llava-1.6: LLaVA-1.6 was released with imroved reasoning, OCR, and world knowledge. Along with performance improvements, LLaVA-1.6 maintains the minimalist design and data efficiency of LLaVA-1.5, surpassing Gemini Pro on several benchmarks.
Light Models:
Imp: a strong multimodal small language model with 3 billion parameters imp-v1-3b was introduced this month. Build upon SLM Phi-2 (2.7B), a visual encoder SigLIP (0.4B) and trained on the LLaVA-v1.5 training set, Imp significantly outperforms the counterparts of similar model sizes.
Moondream1: Despite its size (1.6B parameters), Moondream1 competes favorably with models twice its size. Trained on the Llava dataset, it utilizes SigLIP as the vision tower and Phi-1.5 as the text encoder.
MoE-LLaVa: Applying the Mixture of Experts (MoE) concept to Large Vision-Language Models, MoE-LLaVa achieves Llava-1.5-7B performance with just 2 billion parameters. A novel training strategy MoE-tuning can constructing a sparse model with an outrageous number of parameter but a constant computational cost, and effectively addresses the performance degradation typically associated with multi-modal learning and model sparsity.
Unum: A 1.3 billion parameter (mini llama based) model, Unum offers pocket-sized multimodal AI for content understanding and generation. Featuring tiny embedding and generative models, Unum prioritizes speed, multilingual support, and hardware-friendly features.
Miscellaneous
AI and higher education
Arizona State University (ASU) has joined forces with OpenAI to introduce generative AI technology into higher education processes. ASU President Michael M. Crow expressed optimism about AI’s potential as a valuable learning tool. The partnership aims to leverage OpenAI’s technology in three key areas: enhancing student success, fostering innovative research, and streamlining organizational processes. The Initiative also includes developing a personalized AI tutor for STEM subjects and employing AI avatars as creative study buddies.
Layoffs
The tech industry is witnessing a surge in layoffs, with announcements from companies like Salesforce planning to trimapproximately 700 jobs and Google confirming cutbacks in a few hundred roles. One notable case is the language learning app Duolingo, which executed a 10% reduction in contract employees, openly attributing it, in part, to the integration of AI. Duolingo’s decision to cut some contractors aligns with the increasing trend of leveraging GenAI for content creation, indicating a broader shift of tasks traditionally handled by human workers to AI tools.
Why it matters:
Setting aside the ethical considerations of the layoffs, the key point remains: AI handles technical, mundane tasks with comparable efficiency to humans, and even more effectively. If you’re familiar with AI tools, you understand their incredible ability to process information quickly. However, AI still struggles with the creative work that a human creator excels at. Humans guide AI models, design tasks to achieve specific goals, and review the results.
This is evident in the Duolingo’s case, where some contractors were let go, while others involved in content curation were retained. The method of communicating this decision may raise questions, but it’s important to recognize that the decision is in line with the business perspective, for better or worse.
Dmitry Shironosov, CEO of Everypixel
Taylor Swift’s AI fakes
Graphic and sexually explicit AI-generated images of Taylor Swift have inundated X, with one post amassing over 45 million views before its removal. While X’s policies explicitly prohibit such content, the platform faced criticism for delayed responses. In response, fans flooded related hashtags with genuine clips of Swift, attempting to counteract the explicit fakes.