Weekly Conversations About Building AI Agents
Join Mastra cofounders Shane Thomas and Abhi Aiyer for weekly conversations about the latest in AI.
They discuss breaking AI news, chat with guests from the industry, and go deep on the technical challenges of building AI agents.
Listen on:
Breaking AI News, Technical Deep Dives and Guests From the Industry
Latest Episodes
May 13, 2026
#84
Anthropic × SpaceX, the Services Wars & HTML Is the New Markdown | This Week In AI
Shane and Abhi bring you a new batch of AI news. Anthropic strikes a compute deal with SpaceX. They immediately double Claude Code's rate limits and raise API rate limits for Opus. Corgi launches AI Coverage — insurance for when your AI messes up — plus a $160M Series B. Jarred Sumner says Robobun is the top contributor to Bun, then quietly tries rewriting Bun in Rust. It passes 99.8% of the test suite. OpenAI and Anthropic both announce vertically integrated AI services companies. OpenAI launches the Deployment Company, a consortium of 19 investment firms and SIs. Anthropic teams up with Blackstone, Hellman & Friedman, and Goldman Sachs on a parallel firm for the mid-market. Anthropic ships financial services agent templates, brings Claude Platform to AWS as GA, and launches Dreaming in Managed Agents — offline memory consolidation Anthropic calls REM sleep for your agent. Terminal-Bench 2.1 ships with a public audit. WorkOS releases Horizon, a self-driving codebase. Shopify releases River, an agent that lives in Slack and is available only in public channels. Coinbase cuts 14%. Brian Armstrong attributes it to market plus AI. Elad Gil's framing of the AI productivity throughline gets co-signed by Andreessen. Braintrust confirms a breach. Thariq says HTML is the new Markdown. Karpathy co-signs. Ramp Labs publishes how they used Prime-RL post-training to build a spreadsheets agent faster than Opus and almost as fast as Haiku 4.5. OpenAI's big week: GPT-Realtime 2, Codex in Chrome, ChatGPT in Excel and Google Sheets. Google's quiet week: Gemini 3.1 flash-lite, Gemma 4 up to 3x faster, File Search multi-modal. ERNIE 5.1 approaches SOTA at ~6% of the cost. AI Agents Hour is a weekly livestream by Mastra CPO Shane Thomas and CTO Abhi Aiyer. Mondays, 12PM Pacific.
May 6, 2026
#83
Codex Adds Pets, Cursor Ships an SDK & Claude Connects to Blender and Ableton - This Week In AI
Shane and Abhi are in person at the CodeRabbit studio, and AISI just quietly torched one of Anthropic's loudest narratives. AISI confirmed GPT-5.5 is the second model to complete a multi-step cyber attack simulation end-to-end. The first was Mythos. David Cramer calls TUIs "caveman shit." Kenzie at Browserbase builds an agent in under ten minutes that ranks every SF tech event by free food probability. Codex ships Tamagotchi-style pets. Apple accidentally leaves CLAUDE.md files in a support app update. Cursor releases its SDK. OpenCode 2.0 becomes embeddable. Matt Pocock drops Sandcastle. Warp goes open source. The harnesses are becoming frameworks, and the frameworks are growing harnesses. Anthropic Ships connectors for Blender. Claude Security enters public beta. /goal lands in Codex CLI as OpenAI's take on the Ralph loop. OpenAI says GPT-5.5 is its strongest launch yet — API revenue 2x faster than any prior release, Codex revenue doubling in seven days. Vasuman posts an essay on why building real agents is harder than the hype suggests. Open weights keep closing the gap. Kimi K2.6 beats Claude, GPT-5.5, and Gemini at a programming contest. Qwen3 6.27B takes the open weights crown under 150B parameters. Mistral Medium 3.5 lands as a 128B dense model with 256k context. GitHub has a rough week. Wiz Research discloses an RCE achievable with a single git push. Agents are becoming customers. Stripe Link is the wallet for agents. Cloudflare lets agents start paid subscriptions. Doola and Replit will form a US LLC inside the chat. RAMP's coding agent now writes 70% of merged PRs. DeepSeek's input cache is 10x cheaper. Node 20 hits EOL, Zod prepares to drop CommonJS, and TypeScript native previews ship. AI Agents Hour is a weekly livestream by Mastra CPO Shane Thomas and CTO Abhi Aiyer. Mondays, 12PM Pacific.
April 30, 2026
#81
Have We Hit an AI Wall? GPT-5.5, Anthropic's Meltdown, and Elon vs. OpenAI - This Week In AI
An AI agent destroyed a production database and confessed in writing. A law firm submitted AI hallucinations to court. Anthropic's status page shows 98.65% uptime — about five days of downtime a year. Have we hit a wall? GPT-5.5 lands. Codex hit 4 million users in two weeks. Peter Yang's F-Zero test — which no model had cleared before — finally fell to GPT-5.5 with Codex. Lovable reports 23.1% fewer tool calls and 12.5% higher scores on the hardest benchmarks. Kimmonismus calls it the Claude Mythos level for public use. Codex 5.5 unprompted started SIGKILL-ing Claude Code processes. Elon goes nuclear. OpenAI calls the lawsuit baseless and demands Musk on the stand. Musk fires back, calling Sam Altman "Scam Altman" and accusing him and Greg Brockman of stealing a charity. Mid-war, SpaceX announces SpaceXAI and Cursor are now working closely together — Cursor's distribution paired with Colossus's million-H100-equivalent compute, with SpaceX holding the right to acquire Cursor for $60 billion. The Anthropic dam keeps cracking. Claude Code pulled from Pro — same product, 5x the price overnight. Opus 4.7 regressed on the BridgeBench Bullshit Benchmark, accepting made-up jargon 24% of the time. Bloomberg reports the unreleased Mythos model was accessed by unauthorized users. Om Patel got billed $200 in a day because his repo had a HERMES.md file. The community shipped clawd.rip — every Claude incident since 2023, cataloged. Google plans to invest up to $40 billion in Anthropic and announced 960,000 Rubin GPUs at Cloud Next. AWS struck a strategic partnership with OpenAI. David Silver left DeepMind to raise a $1.1 billion seed. Open weights are eating the world. Kimi K2.6 lands at #4 on the Artificial Analysis Intelligence Index and #1 on Design Arena, ahead of Opus 4.7. DeepSeek V4 ships at 1/20th the cost of Opus 4.7. OpenAI also shipped Chronicle memory for Codex, workspace agents in ChatGPT, Images 2.0, the open-weight Privacy Filter, and Symphony — an open-source Codex orchestration spec.
April 25, 2026
#80
Build your first AI agent in 90 minutes
The guy who taught Abhi JavaScript is back! Guil Hernandez has spent 15+ years teaching developers. His courses at Treehouse, Scrimba, and LinkedIn Learning have reached over 500,000 learners — including Abhi and Shane, who both learned JavaScript and CSS from him. He just released Mastra's first video course at https://mastra.ai/learn, and it's free. "Build Your First Agent in TypeScript" is a 90-minute, hands-on course that takes you from zero to a deployed agent. Fourteen lessons across five sections: agents, tools, workflows, memory, and production. The project is a theme park planner agent — pulls live wait times, weather, and park hours, keeps track of what you like, and builds you an itinerary. Everything runs in Mastra Studio, so you can inspect traces, tool calls, and behavior as you go. You'll see how to wire up local tools and MCP servers side by side, how message history and observational memory change agent behavior, how to compose a workflow for a mock ticket purchase, and how to expose the whole thing as an HTTP server with one-click Slack integration. Guil also shares his broader take on teaching AI engineering. The mechanics — syntax, boilerplate, wiring — are no longer the hard part. What matters now is how you think through a problem, whether you have the taste to spot bad output, and when to take the handoff from the AI instead of iterating forever. The gap between people who just generate output and people who can actually shape it keeps widening. This course is built for the second group. Start here: https://mastra.ai/learn
April 22, 2026
#79
Vercel Got Hacked, Lovable Blamed Users, and Opus 4.7 Costs More Than You Think - This Week in AI
A Vercel employee's Google Workspace was compromised via a third-party AI tool — attackers pivoted from the OAuth app into Vercel's environment variables, moving at a speed attributed to AI assistance. René Brandel, founder of Casco (YC X25) and ex-founding member of AWS's Generative AI team, joins live to break down the attack chain and walk through the exact Google Workspace admin setting that could have prevented it. In a separate incident, every Lovable project created before November 2025 was readable by any free account, exposing database credentials and chat histories. Their response blamed unclear documentation rather than the underlying issue — and the contrast with Vercel's handling is stark. Beyond security: Claude Opus 4.7 launched to mixed reactions. The benchmarks look good, but Simon Willison measured the new tokenizer at 1.46x the tokens of 4.6 on identical content — at unchanged prices, that's ~40% cost increase, and 3x for images. Anthropic's own docs said 1–1.35x. Independent measurements landed at 1.47x. Theo called the redesign "vibe-coded," and a locally run open-source Qwen model drew a better pelican SVG than Opus 4.7 at thinking level max. Anthropic launched Claude Design, which lets you make prototypes, slides, and one-pagers by talking to Claude, powered by Opus 4.7. OpenAI shipped a major Agents SDK update with Codex memory and GPT-Rosalind for biomedical research. Cloudflare shipped Artifacts and memory primitives for agents, Factory AI raised $150M at $1.5B, Qwen 3.6-35B went Apache 2.0.
April 15, 2026
#78
Proof that Opus 4.6 Is Getting Worse, Ramp AI Coworker, MiniMax M2.7 & More (This Week In AI)
Mounting evidence that Claude Opus 4.6 has been degraded — BridgeBench shows a 15-point accuracy drop on their hallucination benchmark, and AMD's Senior AI Director found median thinking collapsed from ~2,200 to ~600 characters between January and March. The hosts share their own experiences, and they line up. Meanwhile, a claim surfaced that Cursor Agent is a rebranded version of Claude Code, running behind a local proxy with a find-and-replace engine that swaps "Claude" for "Cursor" in system prompts. Cursor's Michael Truell responded, saying it was a sub-1% A/B test. The hosts break down both sides. On the shipping front, Anthropic launched Claude Managed Agents in public beta, released Claude for Word, shared details on Claude Mythos Preview — including speculation that it's a looped language model based on a ByteDance paper — and expanded its Google/Broadcom partnership for multiple gigawatts of compute. Their run rate reportedly jumped from ~$9B to $30B in four months. Sam Altman published a personal blog post revealing that someone threw a Molotov cocktail at his house. Plus: why senior executives are voluntarily dropping title to join AI companies, Ramp's internal AI productivity suite Glass, Ramp Labs' Latent Briefing paper showing 31% token savings for multi-agent systems, Scale AI's Muse Spark model now powering Meta AI, GLM-5.1 breaking into Code Arena's top 3, MiniMax shipping MMX CLI and open-sourcing M2.7, and widespread benchmark cheating exposed across nine agent benchmarks.
April 9, 2026
#77
OpenAI Buys TBPN, Anthropic DMCA's 8,000 Repos, Milla Jovovich Builds Memory (This Week In AI)
OpenAI acquired TBPN — the daily tech news show — announced the day after April Fools. TBPN built an independent voice in tech media over eighteen months, and OpenAI saw that as worth buying. AI companies acquiring media is a new pattern.
April 2, 2026
#76
Anthropic Leaked Their Own Source Code, OpenAI Raised $122b, and Axios Got Hacked (This Week In AI)
Shane and Abhi bring you your weekly roundup of AI news! Claude Code's entire source code leaked via an exposed .map file in npm — 512,000 lines of TypeScript, 50K GitHub stars before DMCAs started flying. What people found: Claude Code uses ~20 tools, and there's a regex that silently logs user frustration to analytics. Same week, a CMS misconfiguration exposed a draft blog post revealing Mythos and Capybara — a new model tier above Opus described as posing "unprecedented cybersecurity risks." Fortune separately confirmed a source saying Opus 5 is "so good it poses a danger."
March 26, 2026
#75
Claude Uses Your Computer, Openai Buys Python Tools & The Cursor/Kimi Plot Twist (This Week In AI)
Shane and Abhi kick off with a viral quote: if your $500K engineer isn't burning $250K in tokens, something is wrong. OpenAI is acquiring Astral — the team behind uv and Ruff — joining the Codex team. OpenAI bets on Python; Anthropic bet on TypeScript with Bun. Then Cursor drama: someone found Composer 2 is powered by Kimi K2.5, Kimi confirmed it, and raised another $1B at an $18B valuation — three rounds in 90 days. Anthropic shipped Claude Code Channels (Telegram/Discord control), Cowork Dispatch (persistent agent, message from phone), and a deep dive on how they use Skills. Matt Pocock found quality drops past 100K on the 1M context window. And 52 million views on enabling Claude to use your computer — Mac only. Stripe launched MPP for agent-to-agent payments. Better Auth launched the Agent Auth Protocol. Cloudflare shipped Dynamic Workers for AI-generated code in isolates. LangChain open-sourced Deep Agents, Composio shipped 30-parallel-agent orchestration, OpenCode lost its Claude Max plugin after Anthropic sent lawyers, and Netlify and Google Stitch entered vibe coding and design. EsoLang-Bench: LLMs score 85–95% on standard benchmarks but collapse to 0–11% on esoteric languages — memorization, not reasoning. Quick hits: GPT-5.4 mini/nano, Minimax M2.7, Morph FlashCompact, AI CMO, Letta pivots to coding agents, GLM-OCR, LiteLLM supply chain attack.
March 24, 2026
#74
Email Broke Productivity - It's Time To Fix It (with Brett and Naveen from Micro)
Brett Goldstein and Naveen Sreekandan from Micro join Shane and Abhi to talk about why they believe the future of productivity looks completely different from what we have today. Micro is an all-in-one productivity platform: email client, CRM, calendar, tasks, docs, meeting notes, and a powerful AI agent, all built on a unified graph where every object (like emails, people, companies, meetings, documents) is interconnected. The thesis is simple but bold: email isn't just a list of messages to get through. It's the world's most-used CRM, travel app, hiring tool, and developer notification system. Micro restructures that data so each use case actually feels like the right tool for the job — your sales pipeline as a Kanban board, your GitHub notifications as a task board, your contacts fully enriched from every email and meeting you've ever had. Brett walks us through the demo: the daily orchestrator automation that audits itself, updates its own prompt, generates your day plan, and has even prepped talking points for this interview. Context docs let the agent know everything it needs. The CRM auto-fills and auto-updates from emails and meeting notes. The X integration lets the agent pull recent posts from anyone you're about to meet. Naveen covers the architecture: built on Mastra, using agent and workflow primitives on top of a graph-based data model backed by Postgres with a custom query layer called Prism. One main agent with dynamic context injection handles both chat and automations — the agent knows whether it's in automation mode (just give the output) or chat mode (ask follow-up questions). Supermemory powers vector search. Dedicated sub-agents handle specific workflows, such as email labeling and meeting note summarization.
March 20, 2026
#73
Two Lines of Code to Lock Down Your Agents - Mastra Studio Auth
Mastra Studio started as a local playground for developers to test agents and workflows without having to spin up a custom UI. But as the feature set grew, teams started asking: how do we share this with non-technical teammates? How do we control what different users can do? Ryan, an engineer at Mastra, walks through the new Mastra Studio Auth — now baked directly into Studio. Starting with simple token-based auth (two lines of config), you can lock down your Studio from the open internet. From there, RBAC lets you map roles to granular permissions — 80 auto-generated permissions derived directly from Studio's routes and handlers, controllable via wildcard patterns. Out-of-the-box providers include WorkOS, Auth0, Supabase, Firebase, and Clerk, with GitHub and others in open PRs. The team also discusses what's coming next: audit logs so you can see exactly what an agent did, why it accessed a given tool, and whether it should have. Auth for agents in production isn't magic — your tool files still need to check permissions — but Mastra handles the plumbing so you can focus on building securely.
March 18, 2026
#72
NVIDIA GTC, The Death of MPC, and AI Agents Are Hiring Humans - This Week in AI
Shane hosts this week's news from his usual studio while Abhi joins remotely from NVIDIA GTC 2026 in San Jose. Jensen Huang's keynote set the tone: NVIDIA is doubling down on AI factories, pushing 100x more token throughput, and helping bring OpenAI onto AWS infrastructure.
