This Week in AI: OpenAI’s Custom Chip, Gemini Gets Computer Use, and Qualcomm Buys Modular

June 24, 2026 turned out to be one of the most packed days in recent AI news. Three major announcements dropped almost simultaneously: OpenAI revealed its first custom inference chip built with Broadcom, Google integrated computer use directly into Gemini 3.5 Flash, and Qualcomm announced it’s acquiring Modular, the company behind the Mojo programming language and the MAX inference engine. Each story signals a different shift in the AI landscape — let’s break them down.

OpenAI’s Jalapeño: A Custom Chip for Inference

OpenAI has officially unveiled Jalapeño, its first custom-built inference processor. Designed and manufactured in collaboration with Broadcom, the chip is purpose-built for the specific demands of OpenAI’s inference workloads — a notable departure from the company’s reliance on Nvidia GPUs.

The partnership between OpenAI and Broadcom was first announced in October 2025, but the chip itself remained under wraps until now. What makes Jalapeño interesting isn’t just the silicon itself — it’s that OpenAI used its own AI models to assist in the chip’s development. In a statement, the company emphasized that early testing shows significantly better performance-per-watt than current state-of-the-art alternatives.

Jalapeño is focused squarely on inference — the process of running pre-trained models to generate responses — not training. This is a deliberate strategic choice. Training happens in periodic bursts and can absorb high costs, but inference runs 24/7 at massive scale, making it the dominant cost driver for any AI company serving millions of users. Even modest efficiency gains in inference translate directly to improved margins.

OpenAI president Greg Brockman explained the rationale on the company’s podcast: “We have a deep understanding of the workload. We’ve really been looking for specific workloads that are underserved, [and asking] how can we build something that will be able to accelerate what’s possible?”

The move mirrors what Google and Amazon have been doing for years with their custom AI accelerators (TPUs and Trainium/Inferentia respectively). By designing chips optimized for their own model architectures and serving patterns, these companies can squeeze out efficiency that generic hardware simply can’t match. OpenAI is now following the same playbook — owning the entire stack from model to silicon.

Gemini 3.5 Flash Gets Native Computer Use

Google announced that computer use is now a built-in tool in Gemini 3.5 Flash, rather than requiring a separate specialized model. Previously, computer use was only available through a standalone Gemini 2.5 computer use model. Now it’s integrated natively into the main Flash model, which means developers get both general-purpose reasoning and computer-use capabilities in a single API call.

This matters because it unlocks a class of agentic applications that can see, reason, and take action across browser, mobile, and desktop environments. Think continuous software testing, automated workflows across enterprise applications, and knowledge work that spans multiple tools. The key word here is long-horizon — Google explicitly positions this for tasks that require sustained multi-step interaction, not just simple single-action automations.

Safety is a legitimate concern when AI agents interact with live systems. Google addressed this with two enterprise safeguards: explicit user confirmation for sensitive or irreversible actions, and automatic task suspension when indirect prompt injection is detected. The model itself also received targeted adversarial training specifically for computer use scenarios. Google recommends combining these with sandboxing, human-in-the-loop verification, and strict access controls — a defense-in-depth approach that mirrors what responsible security engineering looks like.

Developers can access computer use in 3.5 Flash through the Gemini API and the Gemini Enterprise Agent Platform. The fact that this landed in the Flash model — Google’s efficient, cost-effective tier — rather than being gated behind a premium model is telling. Google wants computer use to be accessible and widely adopted, not a niche capability.

Qualcomm Acquires Modular: What It Means for AI on Edge

The third major announcement came from Qualcomm, which is acquiring Modular — the AI infrastructure startup founded by Chris Lattner and Tim Davis. Modular is best known for two products: Mojo, a programming language designed for AI workloads that combines Python’s ergonomics with systems-level performance, and MAX, a universal inference engine that can run models from any framework (PyTorch, TensorFlow, JAX) with hardware-agnostic optimization.

For Qualcomm, this acquisition makes strategic sense. The company’s chips already power the vast majority of Android devices and are making aggressive moves into AI PCs and automotive. The challenge has been software: getting AI models to run efficiently on Snapdragon hardware requires deep integration between the model layer and the silicon. MAX’s hardware-agnostic compilation and optimization stack could become the bridge that lets Qualcomm compete more effectively against Nvidia and Apple in on-device AI inference.

Mojo’s future under Qualcomm is less certain but potentially more interesting. A systems-level programming language designed specifically for AI workloads could give Qualcomm a differentiated developer experience if they position it as the preferred way to write high-performance AI code targeting Snapdragon chips. Alternatively, Mojo’s compiler technology could be absorbed into MAX and the language itself sunset — only time will tell.

The Bigger Picture: Vertical Integration Is the Play

What ties these three stories together is the relentless push toward vertical integration in AI. OpenAI is building its own inference chips. Google is integrating computer use directly into its efficient models. Qualcomm is acquiring the software stack it needs to make its hardware more competitive. The message is clear: in 2026, the AI companies that control the most layers of the stack have a structural advantage.

For developers, this means a few practical things. First, inference costs are likely to keep dropping as custom silicon proliferates — good news if you’re running AI-powered services at scale. Second, agentic capabilities are moving from experimental to production-ready, with Gemini 3.5 Flash’s native computer use being a clear signal. Third, the AI infrastructure layer is consolidating, which could mean fewer choices in tooling but better integration within each ecosystem.

The competitive dynamics are shifting fast. A year ago, Nvidia’s dominance in AI hardware seemed unassailable. Today, OpenAI, Google, Amazon, and now Qualcomm (via Modular) are all building or acquiring their way toward custom silicon with optimized software stacks. The next 12 months will be about who executes on that integration most effectively.

OpenAI’s Jalapeño: A Custom Chip for Inference

Gemini 3.5 Flash Gets Native Computer Use

Qualcomm Acquires Modular: What It Means for AI on Edge

The Bigger Picture: Vertical Integration Is the Play

Leave a Reply Cancel reply