high 2026-05-07T00:00:00.000Z

AI News: May 7, 2026 — IBM Think Wraps, Claude Sonnet 4.6 Leads ClawBench, Anthropic JV Confirmed

🔗 Оригинал →

AI News: May 7, 2026 — IBM Think Wraps, Claude Sonnet 4.6 Leads ClawBench, Anthropic JV Confirmed

Source: AIToolsRecap.com | Published: May 7, 2026

Overview

  • IBM Think 2026 concluded its four-day run in Boston with multiple general-availability product launches.
  • Claude Sonnet 4.6 achieved a top score of 33.3% on ClawBench, the first agent benchmark evaluated against live production websites.
  • Anthropic’s $1.5 billion private equity joint venture structure was formally confirmed, with major contributions from Blackstone, Hellman & Friedman, and Goldman Sachs.

IBM Think 2026 — Final Day Highlights

IBM’s annual Think conference closed May 7 in Boston with GA releases for several previewed products:

  • IBM Sovereign Core — Now GA. Embeds governance policy at the infrastructure runtime level for regulated, cross-border environments.
  • IBM Bob — Launched in tiers: Pro, Pro+, Ultra, and Enterprise SaaS. An end-to-end software development partner covering code generation, testing, security, and deployment across the full SDLC. Unlike point-in-time coding assistants, Bob operates across the entire application lifecycle.
  • Next-Gen IBM watsonx Orchestrate — Full release for multi-agent orchestration. Enables enterprises to build, deploy, and manage thousands of agents built by different teams across an organization.
  • IBM Docling for watsonx — Document intelligence platform that converts documents into structured Markdown, JSON, and HTML for RAG workflows.
  • OpenRAG on watsonx.data — Open agentic retrieval framework shipped alongside the conference close.

Claude Sonnet 4.6 Tops ClawBench

Researchers from UBC and Vector Institute published ClawBench, a new evaluation framework for real-world AI agents:

  • Scale: 153 tasks across 144 live production websites in 15 categories, including completing purchases, booking appointments, and submitting job applications.
  • Live-Site Execution: Unlike prior sandbox benchmarks, ClawBench operates on real production sites, intercepting only the final submission request to keep evaluation safe.
  • Top Score: Claude Sonnet 4.6 scored 33.3%, the highest among all frontier models tested.
  • Behavioral Capture: Records five layers of data per run:
    1. Session replays
    2. Screenshots
    3. HTTP traffic
    4. Agent reasoning traces
    5. Browser actions
  • Evaluation: Scored by an agentic evaluator that produces step-level diagnostics.

Anthropic $1.5B Joint Venture — Structure Confirmed

The full structure of Anthropic’s private equity joint venture is now confirmed:

  • Vehicle Size: $1.5 billion
  • Major Contributions:
    • Anthropic: ~$300 million
    • Blackstone: ~$300 million
    • Hellman & Friedman: ~$300 million
    • Goldman Sachs: $150 million
  • Additional Participants: Apollo Global Management, General Atlantic, Leonard Green, GIC, and Sequoia Capital.
  • Operating Model: A forward-deployed enterprise services firm that embeds Claude directly into the operations of PE-backed portfolio companies.
  • Key Quote: CFO Krishna Rao said the structure exists because enterprise demand for Claude is “significantly outpacing any single delivery model.”