2024 marked a year where innovations happened at every layer of the AI stack, from revolutionary hardware advances to breakthrough applications. These advances made AI development more accessible and more powerful than ever before. Let’s take a look at the developments that transformed the field over the past year.
Infrastructure Layer: Building Better Foundations
The hardware powering AI underwent dramatic evolution in 2024, with breakthroughs that made AI both more powerful and more accessible than ever.
NVIDIA
Blackwell Architecture
NVIDIA's Blackwell architecture, released in March, delivered significant performance improvements for GenAI workloads. The platform enabled organizations to run trillion-parameter large language models (LLMs) while reducing the cost and energy consumption by up to 25x compared to previous generations.
Jetson Nano Super
December saw NVIDIA release its most affordable AI computing platform yet: the Jetson Orin Nano Super. The compact device achieved a 1.7x leap in GenAI performance and a 50% increase in memory bandwidth, while dropping its price to $249. This made sophisticated AI processing accessible to developers, students, and hobbyists.
Foundation Models Layer: The Race Accelerates
The most visible AI advances of 2024 came at the model layer, where competition drove rapid innovation across multiple fronts.
OpenAI
GPT-4o (May)
In May, OpenAI revolutionized AI capabilities with GPT-4o, a model that seamlessly integrated text, vision, and audio processing. Setting new industry benchmarks on multi-lingual, audio, and vision capabilities, the model offered unprecedented efficiency — operating at twice the speed and half the cost of previous versions. With its 128,000 token context window, GPT-4o demonstrated that a single model could effectively handle complex tasks across modalities.
GPT-4o Mini (July)
In July, OpenAI launched GPT-4o Mini, making advanced AI more accessible to developers and businesses. Priced at $0.15/M tokens, 60% cheaper than GPT-3.5 Turbo, the model significantly lowered the barrier to entry for enterprise AI deployment while maintaining high performance capabilities. It became the default model for non-logged-in users and those who reached usage limits for GPT-4o.
o1 Model (September)
September brought o1, OpenAI's model focused on deep reasoning. The model was designed to take more time processing problems, similar to how a human would approach complex tasks. This methodical approach paid off — o1 achieved PhD-level performance in physics, chemistry, and biology, while solving 83% of International Mathematics Olympiad qualifying problems. The model demonstrated that giving AI systems more time to 'think' could lead to dramatically improved results.
o3 Model (December)
December saw OpenAI release o3, which introduced 'deliberative alignment' and adjustable computing power — the more time the model spent thinking, the better it performed. On the ARC-AGI benchmark for measuring general intelligence capabilities, o3 achieved an unprecedented 87.5% score when given maximum computing resources. The model also significantly advanced mathematical reasoning and coding capabilities, suggesting a future where AI systems could dynamically adjust their processing based on task complexity.
SORA (December)
December also saw the release of SORA, OpenAI's breakthrough in video generation. The model can create videos up to 1080p resolution in widescreen, vertical, or square formats, with sophisticated control over length and style. SORA demonstrates unprecedented physics understanding in its generated content, marking a significant step toward AI that can truly comprehend and simulate real-world motion and interaction.
Anthropic
Claude 3 (March)
March saw Anthropic release the Claude 3 model family: Haiku, Sonnet, and Opus. This marked a breakthrough in balancing power and speed: Models could now maintain high performance while operating at significantly faster speeds. The family's sophisticated vision capabilities and advanced knowledge processing showed how AI could seamlessly handle multiple types of tasks that previously required specialized systems.
Claude Sonnet-3.5 (June)
Anthropic's Claude Sonnet-3.5 emerged as a surprise leader in reasoning and analysis tasks. The model excelled in visual reasoning tasks like chart interpretation while setting benchmarks in graduate-level reasoning, undergraduate knowledge, and coding proficiency. Most notably, it achieved these advances while operating at twice the speed of Claude 3 Opus and one-fifth the cost. This combination of better performance and lower resource requirements suggested a future where advanced AI capabilities would become increasingly accessible.
Gemini 1.5 Pro
February brought Gemini 1.5 Pro, featuring breakthrough advances in long-context understanding. The model could process up to one million tokens consistently — handling hour-long videos, 11 hours of audio, or 700,000 words of text in a single prompt. This massive context window, combined with robust performance that maintained accuracy even at scale, marked a significant step forward in AI's ability to understand and process large amounts of information.
Veo
In May, Google introduced Veo, an advanced video generation model capable of producing 1080p videos longer than a minute. The model supported various cinematic styles including time-lapse and aerial shots, with plans for integration into YouTube Shorts.
AI Search Evolution
May also saw Google transform its search experience with the addition of AI-generated summaries. This marked a fundamental shift in how billions of people access information online, moving from lists of links to contextual, synthesized answers. This real-world deployment of GenAI at massive scale provided valuable lessons about user interaction with AI-generated content.
Meta
LLaMA 3 Series (April-September)
Meta's aggressive open-source development changed the dynamics of the AI industry. Starting with efficient 8B models and scaling to a 405B parameter version, the LLaMA 3 series proved that state-of-the-art AI didn't need to be proprietary.
The April release offered 8B and 70B parameter models trained on 15 trillion tokens, excelling in coding and multilingual tasks. By July, LLaMA 3.1 scaled up to 405B parameters, surpassing commercial leaders on key benchmarks. September's LLaMA 3.2 added vision capabilities and introduced mobile-optimized 1B and 3B parameter models, making state-of-the-art AI accessible across computing environments from smartphones to data centers.
xAI
Grok Evolution (March-August)
In March, xAI open-sourced Grok-1, a 314 billion parameter Mixture-of-Experts model, under the Apache 2.0 license. By August, Grok-2 demonstrated significant advances, outperforming leading models on the LMSYS leaderboard and achieving 87.5% on the MMLU benchmark. The simultaneous release of Grok-2 mini showed xAI's commitment to making advanced AI accessible at different scales. Grok-2 showed particular strength in reasoning with retrieved content and content analysis, while improving its ability to identify and discard irrelevant information.
Amazon
Nova Series (December)
December saw Amazon enter the race with its Nova series of AI models. Announced at AWS re:Invent, the Nova family (Micro, Lite, and Pro) delivers text, image, and video generation capabilities while focusing on lowering costs and reducing latency. This marked Amazon's push to provide competitive foundation models directly through its cloud infrastructure.
Application Layer: Moving From Lab to Production
While infrastructure and model developments grabbed headlines, 2024's application layer innovations showed how AI could be applied in impactful and creative ways.
Cognition Labs
Devin AI (March)
March saw Cognition Labs introduce Devin AI, an autonomous software development assistant. The system could debug existing code, generate new code, and solve complex programming problems based on natural language prompts. This marked a step toward AI systems that could handle sophisticated software development tasks independently.
ElevenLabs
Professional Voice Cloning
In April, ElevenLabs launched its Professional Voice Cloning service, enabling users to create digital voice replicas of themselves. The service rapidly expanded to support almost 30 languages, paving the way for automatic localization of content across languages while maintaining natural-sounding speech. This breakthrough suggested a future where language barriers in media and communication could be dramatically reduced through AI voice translation.
Suno
AI Music Studio
Suno transformed AI music generation from a novelty into a practical creative tool. Its December release moved beyond simple melody generation to produce complete songs with more sophisticated arrangements and realistic vocals in multiple genres. The result is an AI application that gives voice to musical creativity through simple text prompting.
What Does This Mean for Your GenAI Strategy?
As 2024 demonstrated, GenAI is advancing at a breakneck pace. From foundational hardware breakthroughs to sophisticated applications, the underlying technology is evolving faster than ever. For organizations, this raises a critical question: How can you build AI applications that stay relevant as technology changes? And how do you manage the complexity of coordinating models and agents across a growing number of use cases while maintaining control over costs and risks?
The Dataiku LLM Mesh can help address these challenges. It provides a unified backbone for GenAI, allowing you to update and integrate new technologies without scrapping your current applications. At the same time, it offers a clear, centralized way to monitor and govern AI usage across your organization, keeping your systems reliable and secure.