AI News: May 7, 2026 — IBM Think Wraps, Claude Sonnet 4.6 Leads ClawBench, Anthropic JV Confirmed

Source: AIToolsRecap.com | Published: May 7, 2026

Overview

IBM Think 2026 concluded its four-day run in Boston with multiple general-availability product launches.
Claude Sonnet 4.6 achieved a top score of 33.3% on ClawBench, the first agent benchmark evaluated against live production websites.
Anthropic’s $1.5 billion private equity joint venture structure was formally confirmed, with major contributions from Blackstone, Hellman & Friedman, and Goldman Sachs.

IBM’s annual Think conference closed May 7 in Boston with GA releases for several previewed products:

IBM Sovereign Core — Now GA. Embeds governance policy at the infrastructure runtime level for regulated, cross-border environments.
IBM Bob — Launched in tiers: Pro, Pro+, Ultra, and Enterprise SaaS. An end-to-end software development partner covering code generation, testing, security, and deployment across the full SDLC. Unlike point-in-time coding assistants, Bob operates across the entire application lifecycle.
Next-Gen IBM watsonx Orchestrate — Full release for multi-agent orchestration. Enables enterprises to build, deploy, and manage thousands of agents built by different teams across an organization.
IBM Docling for watsonx — Document intelligence platform that converts documents into structured Markdown, JSON, and HTML for RAG workflows.
OpenRAG on watsonx.data — Open agentic retrieval framework shipped alongside the conference close.

Researchers from UBC and Vector Institute published ClawBench, a new evaluation framework for real-world AI agents:

Scale: 153 tasks across 144 live production websites in 15 categories, including completing purchases, booking appointments, and submitting job applications.
Live-Site Execution: Unlike prior sandbox benchmarks, ClawBench operates on real production sites, intercepting only the final submission request to keep evaluation safe.
Top Score: Claude Sonnet 4.6 scored 33.3%, the highest among all frontier models tested.
Behavioral Capture: Records five layers of data per run:
1. Session replays
2. Screenshots
3. HTTP traffic
4. Agent reasoning traces
5. Browser actions
Evaluation: Scored by an agentic evaluator that produces step-level diagnostics.

The full structure of Anthropic’s private equity joint venture is now confirmed:

Vehicle Size: $1.5 billion
Major Contributions:
- Anthropic: ~$300 million
- Blackstone: ~$300 million
- Hellman & Friedman: ~$300 million
- Goldman Sachs: $150 million
Additional Participants: Apollo Global Management, General Atlantic, Leonard Green, GIC, and Sequoia Capital.
Operating Model: A forward-deployed enterprise services firm that embeds Claude directly into the operations of PE-backed portfolio companies.
Key Quote: CFO Krishna Rao said the structure exists because enterprise demand for Claude is “significantly outpacing any single delivery model.”