AI NewsClaude Code 2.1.145

Claude Code 2.1.145 / Codex Mobile / Anthropic's Six-Week Quality Postmortem — Third-Week-of-May 2026 "Coding-AI Infrastructure Era" Roundup

In the third week of May 2026, coding AI shipped three back-to-back "infrastructure era" events. Claude Code 2.1.145 (Fast mode defaults to Opus 4.7, OTel agent_id spans, 50+ improvements), OpenAI Codex mobile (iPhone QR-pair a Mac Codex session, 4M weekly users), and Anthropic publishing the technical detail of a six-week quality regression (three product changes — reasoning-effort downgrade / caching bug / verbosity limit — chained together). Read across three trends: tool → infrastructure, freedom from place and time, and SRE-grade quality accountability.

中澤 圭志

中澤 圭志

@keishi_nakazawa

Sales Claw maintainer

·16 min
Claude Code 2.1.145 / Codex Mobile / Anthropic's Six-Week Quality Postmortem — Third-Week-of-May 2026 "Coding-AI Infrastructure Era" Roundup
This English article is a concise version of the original. For the full Japanese deep-dive, see the Japanese original.

Key Facts

Claude Code latest

2.1.145 (2026-05-20, ~6 hours before publication)

Codex mobile

iPhone/Android preview (2026-05-14), QR-pair to Mac, 4M weekly users

Anthropic postmortem

v2.1.116 (2026-04-20) — all three product changes fixed

Common direction

Coding AI shifting from "tool" to "daily-use infrastructure"

TL;DR

In the third week of May 2026, the coding-AI industry shipped three back-to-back events that only happen once products enter "daily-use infrastructure". (1) Claude Code 2.1.145 landed six hours before this post — Fast mode silently switched to Opus 4.7 as default (introduced in 2.1.142), plugin dependency resolution went automatic, Windows right-click paste finally works, and 50+ other improvements shipped. (2) OpenAI Codex got remote control from iPhone / iPad / Android — pair your Mac with the ChatGPT app via QR code and drive a Codex session from anywhere. 4 million weekly active users. (3) Anthropic published the postmortem for a six-week stretch of Claude Code quality complaints, naming three overlapping product changes (reasoning-effort downgrade, caching bug, verbosity limit) and committing to a stricter rollout process. Fixed in v2.1.116. The thread that ties all three together is the same: coding AI is moving from "tool" to "daily-use infrastructure," and this post unpacks what that means for your own agent operations.

Bottom line: The third week of May 2026 was the week the coding-AI industry crossed from "tool" to "infrastructure" on three simultaneous fronts — feature surface, operating mode, and quality accountability. (1) The Claude Code 2.1.142 → 145 burst pushed Opus 4.7 into Fast mode, promoted background sessions to a first-class feature, and shipped OTel spans with agent_id / parent_agent_id so enterprise observability finally works. (2) Codex mobile dissolved the "I have to be at my terminal" assumption. (3) Anthropic's postmortem published the technical root cause of a six-week quality regression in detail. What ties them together is the weight of being used every day — the responsibility that comes with being infrastructure.

"Claude Code 2.1.145 dropped, Codex now runs on phones, Anthropic apologized for something — which one do I need to know about first?"— This post walks through the three coding-AI stories that landed between May 14 and May 20, 2026, using Anthropic's official engineering blog, the anthropics/claude-code GitHub Releases (CHANGELOG.md), OpenAI's official blog, and the OpenAI Codex changelog as primary sources. We then look at how each story should feed back into your own agent operations (Sales Claw included).

These three announcements look disconnected on the surface, but they share the same root: "running coding AI as daily infrastructure." Claude Code is doubling down on background sessions, plugin dependencies, and OTel to make daily operation viable; Codex is invading "non-terminal" time via remote control from phones; and Anthropic is taking on SRE-grade transparency for quality regressions. All three vendors are leaving the lab and moving into the infrastructure business — in the same week.

This article is structured as follows:

  1. What changed in Claude Code 2.1.145 — unpacking the 142 → 145 burst release
  2. The day Codex came to your phone — driving a Mac Codex session from iPhone
  3. Anthropic's six-week quality postmortem — technical details of three product changes
  4. Three "infrastructure-era" trends visible across the three stories
  5. How to act — six concrete steps doable this weekend
  6. Risk and watchouts — enterprise blind spots to avoid
  7. What this changes for Sales Claw — implications for AI sales automation

Primary sources for this post: Anthropic Engineering Blog (April 23 postmortem), anthropics/claude-code GitHub Releases (CHANGELOG.md), OpenAI's official blog (Work with Codex from anywhere), and the OpenAI Codex changelog. For the v2.1.144 release shipped one day earlier, see our Claude Code v2.1.144 walkthrough ; for cross-vendor comparisons that include Cursor Composer 2.5 and Antigravity 2.0, see our Google I/O 2026 roundup.

1. What happened in coding AI during the third week of May 2026

Timeline of what happened that week:

  1. 2026-05-14 (Thu): OpenAI Codex mobile announced — iPhone / iPad / Android can now remote-control a Mac Codex session via the ChatGPT app.
  2. 2026-05-18 (Mon): Cursor Composer 2.5 released (Kimi K2.5-based, $0.50 / $2.50 per 1M tokens).
  3. 2026-05-19 (Tue): Claude Code v2.1.144 ships — /resume now supports background sessions, startup hang reduced 75s → 15s, MCP pagination bug fixed, and 35+ other improvements.
  4. 2026-05-19 (Tue), same day: Google I/O 2026 opens (Gemini 3.5 Flash, Antigravity 2.0, Omni, and more).
  5. 2026-05-20 (Wed), morning: Claude Code v2.1.145 ships — claude agents --json, OTel agent_id, permission-prompt bypass fix, and 50+ improvements.
  6. Anthropic Engineering postmortem (originally 2026-04-23): traced six weeks of Claude Code quality regression to three product changes; resurfaced throughout May via Hacker News, InfoQ, and Fortune.
Cover illustration for the third week of May 2026 coding-AI roundup. Center headline reads `Coding AI from tool to infrastructure`. Three zones: Claude Code 2.1.145 (Fast mode Opus 4.7, background sessions, OTel agent_id, 50+ improvements); Codex mobile (iPhone/Android QR pairing for Mac remote control, 4M weekly users); Anthropic six-week postmortem (reasoning-effort downgrade, caching bug, verbosity limit). Bottom sticky note reads `Infrastructure era — the responsibility of being used every day`.
Figure: The third week of May 2026 in coding AI — three stories on one whiteboard

2. What changed in Claude Code 2.1.145 — unpacking the burst release

2.1.142 — Fast mode silently switched to Opus 4.7

[Official] 2.1.142 switched Fast mode's default model from Opus 4.6 to Opus 4.7. Opus 4.7 is Anthropic's newest flagship, released on 2026-04-16, with a 13% lift on coding benchmarks over Opus 4.6. The "Fast mode" label and "speed first" positioning stayed identical — only the underlying model changed, silently.

[Author's view] The signal is that Anthropic now believes Opus 4.7's latency and unit economics are good enough to belong inside Fast mode. The accompanying CLAUDE_CODE_OPUS_4_6_FAST_MODE_OVERRIDE=1 escape hatch tells you they expect at least some users to want to roll back if they perceive quality regressions. For heavy uses inside Sales Claw — form-body generation, approachGuardrails judging — now is the time to re-benchmark under Opus 4.7.

2.1.143 / 144 — background sessions became a real feature

[Official] 2.1.143 made background sessions preserve model and effort level after waking from idle; 2.1.144 added /resume support for background sessions, brought startup hang down from 75s to 15s, and fixed macOS Full Disk Access crashes.

Stacked together, these three fixes finally make "launch a task with claude --bg, let your Mac sleep overnight, /resume in the morning" actually work. Background sessions were labeled "experimental" before; with 2.1.144 they should be treated as a real, supported feature.

項目Before 2.1.142 (experimental)From 2.1.144 (real feature)
Background session /resumeNot available, list unreliableResumable via /resume
Startup hang (when api.anthropic.com is unreachable)75s wait15s timeout
macOS Full Disk Access crashFrequent under ~/DocumentsFixed
Model/effort preservation (after waking from idle)Sometimes resetPreserved
MCP paginationFirst page onlyAll pages

2.1.145 — OTel spans gained agent_id / parent_agent_id

[Official] OpenTelemetry tracing spans now carry agent_id and parent_agent_id attributes. Until now, if you ran multiple subagents in parallel, you could not tell from the trace which call belonged to which subagent. With those two attributes you can reconstruct the parent-child relationship and visualize it in Datadog / Honeycomb / New Relic.

[Author's view] This is Anthropic's formal handshake with enterprise SRE teams running Claude Code. If you run a production agent like Sales Claw, this is a must-have: latency distributions for parallel subagent execution and per-agent token cost finally become visible inside Datadog APM.

2.1.145 — claude agents --json for external tooling

[Official] claude agents --json now lists live Claude sessions as JSON — usable by tmux-resurrect, status bars, session pickers, and anything that wants to parse Claude state.

[Author's view] Anthropic is formally acknowledging the "Claude Code as scriptable component" use case. Until now you had to scrape ANSI-formatted TUI output. For Sales Claw-style autonomous operation, this opens a clean pipeline: "emit live subagent count to Slack or Datadog as a metric" is a five-minute job now.

2.1.143 → 145 — automatic plugin dependency resolution

[Official] 2.1.143: claude plugin disable now refuses when an enabled plugin depends on the target, and claude plugin enable force-enables transitive dependencies. 2.1.145: the /plugin Discover and Browse preview now shows a plugin's commands / agents / skills / hooks / MCP/LSP servers before installation.

[Author's view] The signal here is that the plugin ecosystem has matured to npm-level. With dependency resolution and pre-install metadata both in place, enterprise rollouts can show infosec exactly what a given plugin will add.

2.1.143 — Windows right-click paste finally works

[Official] Right-click paste inside claude agents was broken on Windows Terminal and WSL; 2.1.143 fixes it.

[Author's view] For Windows-based Claude Code users, this is the moment "paste a git diff in, paste a log in" — daily mechanics that broke regularly before — finally works. Shift+Insert and Ctrl+V workarounds had been causing constant friction with CMD windows.

2.1.145 — permission-prompt bypass fixed

[Official] A permission-prompt bypass — "bare variable assignments to non-allowlisted environment variables in Bash commands were auto-approved" — has been fixed.

[Author's view] This is a security fix. Bash commands containing only a variable assignment (e.g., SECRET=$(cat .env)) were bypassing the permission prompt. For enterprise audit-log operation, that bypass was fatal. Sales Claw and similar agents should also audit whether internal secrets are being exposed through bash bridges.

3. The day Codex came to your phone — driving a Mac Codex from iPhone

What you can do

[Official] Capabilities Codex mobile officially supports:

  • Browse all threads — see every Codex session running on your Mac from your phone
  • Review outputs — screen updates, diffs, test results, and terminal output stream live
  • Approve commands — when Codex asks "should I run this?" approve it from your phone
  • Change models — switch between GPT-5.5 / GPT-5.5 Mini / o4-mini remotely
  • Start new tasks — dispatch "fix that issue" to Codex from wherever you are

Security model — files never leave the Mac

[Official] Files, credentials, permissions, and local setup stay on the Mac. The phone receives only "updates" — screen refreshes, terminal output, diffs, test results, and approval requests — in real time, end-to-end encrypted.

[Author's view] This is essentially the same model as Anthropic's Claude Code Remote Control (announced 2026-02), now applied to "my Mac → my phone" instead of "my Mac → my browser tab." It's designed to be compatible with enterprise data-exfiltration policies.

QR pairing flow

[Official] "Codex for Mac presents a QR code; scan it from iPhone / iPad / Android via the ChatGPT app". OpenAI's official blog has the step-by-step video.

[Author's view] QR pairing is "physical proximity required," which is enough security for this use case. Simpler than BLE pairing or OAuth, and compatible with the common enterprise pattern of "company Mac + personal phone."

Codex CLI updates landing alongside mobile

[Official] At the same time, the Codex CLI gained: codex remote-control as a new entrypoint, Bedrock auth via AWS console-login credentials, paged thread views for app-server clients (unloaded / summary / full), multi-environment view_image, and the Python SDK migrating to openai-codex / openai_codex.

[Author's view] codex remote-control is described as "a simpler entrypoint for starting a headless, remotely controllable app-server." For OSS like Sales Claw that may want to "call a Hosted Codex under the hood," this is the reference implementation.

Position relative to other vendors' mobile strategies

[Author's view] 2026 has been the year "coding AI × mobile" became a trend in earnest:

  • 2026-02: Anthropic announces Claude Code Remote Control (web browser + dedicated device)
  • 2026-04: OpenAI launches Codex Background Task (Mac desktop app)
  • 2026-05: OpenAI ships Codex Chrome extension, then Codex mobile
  • 2026-05-19: Google announces Antigravity 2.0 desktop + CLI + SDK at I/O 2026

[Author's view] All four vendors are dismantling the assumption that "coding AI lives in a terminal." Sales Claw — built on a serverless autonomous loop — needs to align with this direction (for example, building a Slack / Email / mobile-app surface for approving Sales Claw sessions).

4. Anthropic's "six weeks of regression" postmortem — what actually broke

Change 1: reasoning effort high → medium (2026-03-04)

[Official] On 2026-03-04 Anthropic changed Claude Code's default reasoning effort from high to medium — to address "latency issues where the UI appeared frozen during long thinking periods."

項目Before 3/4 (high)3/4 → 4/7 (medium)
Default reasoning efforthighmedium
Perceived thinking timeLong, UI feels frozenShort, snappy
Perceived code qualityHighMany users reported regression
Affected models-Sonnet 4.6 + Opus 4.6

[Official] Anthropic itself called this change "the wrong tradeoff". Reverted on 2026-04-07; the new default is now xhigh for Opus 4.7 and high for other models.

[Author's view] This is the classic "we mistraded latency for quality" lesson. Sales Claw and similar agents are exposed to the same trap. "Initial response is slow → reduce verbosity" looks like a quick win, but the perceived "quality dropped" reaction tends to be more damaging in practice.

Change 2: caching bug (2026-03-26)

[Official] On 2026-03-26 Anthropic shipped a change to "clear thinking history once for sessions idle 1+ hour" — to reduce resume latency. A logic error caused thinking history to be cleared on every turn for the rest of the session.

[Official] Impact: "Claude felt forgetful and repetitive" and "usage limits drained faster than expected." Affected Sonnet 4.6 + Opus 4.6. Fixed on 2026-04-10 in v2.1.101.

[Author's view] This is the classic "caching optimization with unintended side effects" pattern. "Run once" / "persist" branching logic is famously hard to unit-test and tends to surface only in production. Sales Claw's autonomous loop has equivalent exposure — for example, "save cookie on first try, reuse afterward" logic could mistakenly re-login every time.

Change 3: verbosity limit (2026-04-16)

[Official] On 2026-04-16 Anthropic added to the system prompt: "Keep text between tool calls to ≤25 words. Keep final responses to ≤100 words unless the task requires more detail." Intent: cut the verbose output tokens of Opus 4.7.

[Official] Broader testing measured a 3% drop in code-generation quality across Opus 4.6 / 4.7. Affected Sonnet 4.6 + Opus 4.6 + Opus 4.7. Reverted on 2026-04-20.

[Author's view] This is the lesson that "a single line in the system prompt has unexpectedly wide blast radius." The intent was just to shorten verbosity, but it leaked into the reasoning expression used between tool calls and ended up degrading code generation. Sales Claw-style autonomous loops carry the same risk: "make form bodies more concise" could end up reducing the granularity of the approachGuardrails judge.

Process changes that came out of the postmortem

[Official] Anthropic committed to the following process improvements:

  1. Expanded Code Review eval capabilities — broader Claude Code evaluation infrastructure
  2. Stricter system-prompt controls — tighter approval process for system-prompt changes
  3. Broader per-model evaluations — split evaluation per model (Sonnet 4.6 / Opus 4.6 / Opus 4.7)
  4. Gradual rollout protocols — canary → 1% → 10% → 100% rollouts applied to product changes

[Author's view] These are standard SRE practice. Anthropic publicly committing to "run Claude Code with SRE process" is the headline. Sales Claw and similar OSS agents in production should adopt the same.

Timeline of Anthropic Claude Code's six-week quality regression and the three overlapping product changes. Center headline 'Six-week regression — chain of three product changes'. Left-to-right timeline: 2026-03-04 reasoning effort high→medium, 2026-03-26 caching bug, 2026-04-16 verbosity limit, 2026-04-20 v2.1.116 fixes all three, 2026-04-23 postmortem published. Each change has 'symptom' and 'fix date' labeled below. Bottom sticky note: 'overlapping impact made the issue widespread; individually each might have gone unnoticed.'
Figure: Figure 2: Anthropic six-week regression — timeline of three product changes

Trend 1: "tool" → "infrastructure"

[Author's view] Looking at the 50+ new features and 145+ bug fixes across Claude Code 2.1.142 → 145, more than half of the new features are about "making existing features durable." Examples:

  • Background session /resume — once started, expected to "run to completion"
  • Startup hang 75s → 15s — even the first 60 seconds of your morning get reclaimed
  • OTel agent_id span — Datadog and Honeycomb can now accumulate long-term metrics
  • Automatic plugin dependency resolution — fits enterprise deploy pipelines

These are infrastructure-grade requirements: "used every day," "never stops at night," "observed by SRE," "approved by infosec." The industry has moved from the 2024 era of "wow, it works" to the 2026 era of "zero downtime."

Trend 2: freedom from place and time

[Author's view] Codex mobile, Claude Code background sessions, and Antigravity standalone desktop — all three vendors are dismantling the assumption that "coding happens while you sit at a terminal."

  • Approve a Mac Codex session from iPhone — coding progresses on the train
  • Claude Code background session — your build finishes while you sleep
  • Antigravity standalone + CLI + SDK — start agents from a program without going through an IDE

This is the new model of "dispatch to coding AI, come back later to see the result." Sales Claw, in the long run, needs to align with this direction.

Trend 3: SRE-grade quality accountability

[Author's view] Anthropic publishing its postmortem is the symbolic moment of "coding-AI vendors adopting SRE process." Until now, AI quality regression was discussed only via user perception. Anthropic broke that pattern by:

  • Disclosing the three changes and their dates at technical depth
  • Reporting the "3% code-quality drop" as a quantitative number
  • Saying "the wrong tradeoff" and "we let users down" in unambiguous terms
  • Listing four prevention process changes as a commitment

These are baseline SRE / Cloud-SLA practices, but the AI industry has been able to skip them under the excuse that "we cannot explain what the model is doing." With this move, Anthropic has raised the bar for accountability across the industry.

Three-column structural map of coding-AI trends in the third week of May 2026. Center headline 'Three infrastructure-era trends'. Column 1 (tool→infrastructure): 'Claude Code 2.1.145 OTel agent_id', 'Background session /resume', 'Automatic plugin dependency'. Column 2 (freedom from place and time): 'Codex mobile iPhone QR pairing', 'Anthropic Remote Control', 'Antigravity standalone'. Column 3 (SRE accountability): 'Anthropic postmortem publication', '3% drop reported quantitatively', 'gradual rollout introduced'. Bottom sticky note: 'Shared root = taking responsibility as daily-use AI.'
Figure: Figure 1: structural map of three infrastructure-era trends in coding AI

6. How to act — six steps this weekend

Step 1: update Claude Code to 2.1.145

[Official] Claude Code is distributed via the @anthropic-ai/claude-code npm package. Check, then upgrade:

# Check current version
claude --version

# Upgrade to latest
npm install -g @anthropic-ai/claude-code@latest

# Re-check
claude --version
# → 2.1.145 (or newer patch)

[Author's view] If you are jumping from 2.1.142 or earlier, back up ~/.claude/.credentials.json before upgrading. 2.1.143 added a fix for "non-array scopes value in .credentials.json hangs the CLI on startup." If your old credentials are still there, re-login is safer.

Step 2: try Fast mode on Opus 4.7 for a week

With Fast mode now defaulting to Opus 4.7, watch for "is quality lower?" or "is cost higher?" over a week.

# Check current model setting
claude /model

# Verify Fast mode (Shift+Tab to cycle permission modes, then F to enter Fast mode)
# Run 5-10 tasks in Fast mode

# Pin to Opus 4.6 if you observe quality regression
# Mac/Linux
export CLAUDE_CODE_OPUS_4_6_FAST_MODE_OVERRIDE=1

# Windows PowerShell
$env:CLAUDE_CODE_OPUS_4_6_FAST_MODE_OVERRIDE = "1"

Step 3: inventory live sessions with claude agents --json

# Dump all background sessions as JSON
claude agents --json

# Use jq to extract "awaiting-input" sessions only (Sales Claw operation example)
claude agents --json | jq '.[] | select(.status == "awaiting_input")'

# Call from a tmux status bar (~/.tmux.conf)
# set -g status-right "#(claude agents --json | jq 'length') agents"

[Author's view] This is step one of "SRE-monitor Claude Code." In Sales Claw operation you can build "alert Slack when more than N sessions are awaiting input" in five minutes.

Step 4: pair Codex mobile via QR

  1. On Mac, launch Codex CLI (codex command)
  2. Codex displays a QR code
  3. Update the ChatGPT app on iPhone / iPad / Android to the latest version
  4. In the Codex section of the ChatGPT app, choose "Pair with Mac"
  5. Scan the QR code with the camera
  6. Pairing complete; all Mac Codex sessions show up on your phone

[Author's view] If you operate under a BYOD policy, check with IT before pairing. Files stay on the Mac, but screen updates and diffs do transfer to the phone.

Step 5: export OTel agent_id to Datadog / Honeycomb

[Official] Claude Code exports OTel via the OTEL_EXPORTER_OTLP_ENDPOINT environment variable:

# Stream to Datadog Agent
export OTEL_EXPORTER_OTLP_ENDPOINT="http://localhost:4318"
export OTEL_EXPORTER_OTLP_HEADERS="dd-api-key=<YOUR_API_KEY>"
export OTEL_SERVICE_NAME="claude-code-sales-claw"

# Stream to Honeycomb
export OTEL_EXPORTER_OTLP_ENDPOINT="https://api.honeycomb.io"
export OTEL_EXPORTER_OTLP_HEADERS="x-honeycomb-team=<YOUR_API_KEY>"

# Start Claude Code — claude_code.tool spans now export with agent_id / parent_agent_id
claude --bg "Run all tests for the project and summarize the results"

Filtering by service:claude-code-sales-claw in the Datadog APM dashboard now shows parallel subagent traces split out by agent_id.

Step 6: audit your system prompts for verbosity instructions

[Author's view] In light of Anthropic's postmortem, grep your own system prompts and CLAUDE.md files for "be concise"-type instructions. For Sales Claw operation:

Six-step implementation checklist for absorbing the third-week May 2026 changes. Center headline 'Six steps — finish this weekend.' Numbered items: 1) update Claude Code to 2.1.145 (npm install -g @anthropic-ai/claude-code@latest); 2) try Fast mode on Opus 4.7 for a week (pin to 4.6 if needed); 3) inventory sessions with claude agents --json (jq for awaiting_input); 4) pair Codex mobile via QR (check IT for BYOD); 5) stream OTel agent_id to Datadog / Honeycomb; 6) audit system prompts for verbosity instructions (the Anthropic 4/16 trap). Bottom sticky note: 'Six done = third-week May absorbed.'
Figure: Figure 3: six-step implementation checklist for this weekend
Timeline of major coding-AI events from March to May 2026 across three lanes: top (Claude Code releases and Anthropic postmortem), middle (OpenAI Codex events), bottom (Google / Cursor). 3/4 reasoning effort high→medium, 3/26 caching bug, 4/16 verbosity limit, 4/20 v2.1.116 fix, 4/23 postmortem published, 5/14 Codex mobile, 5/18 Cursor Composer 2.5, 5/19 Claude Code 2.1.144 + Google I/O 2026, 5/20 Claude Code 2.1.145.
Figure: Figure 4: coding-AI timeline from March to May 2026 (Python timeline)

7. Risk and watchouts — enterprise blind spots

Risk 1: Fast mode → Opus 4.7 cost variance

[Author's view] With Fast mode now defaulting to Opus 4.7, heavy Fast-mode users may see "unexpected monthly cost increases." Opus 4.7 is 13% better on coding benchmarks than Opus 4.6, but per-token price is "flat or slightly higher" in many cases. If you have pipelines that hit Fast mode frequently in Sales Claw operation, pull a cost report this weekend and compare before/after.

Mitigation: pin Fast mode to Opus 4.6 with CLAUDE_CODE_OPUS_4_6_FAST_MODE_OVERRIDE=1 and re-benchmark cost vs. quality on your own workload.

Risk 2: Codex mobile BYOD conflict

[Author's view] Codex mobile keeps "files and credentials on the Mac," but screen updates and diffs do transfer to the phone. Under a BYOD policy this means "some of your business code is now displayed on a personal phone" — and that usually requires IT approval.

Mitigation: company-issued iPhone removes the BYOD problem. Under personal-only policies, "notify IT before pairing and log pairing history in audit" is the safer pattern.

Risk 3: PII risk in OTel agent_id

[Author's view] Claude Code 2.1.145 added agent_id and parent_agent_id to OTel spans — but if your subagent names contain customer names, those names now flow into Datadog / Honeycomb / external OTel collectors. Example for Sales Claw: --agent customer-acme-form-runner ends up looking like PII leakage in observability.

Mitigation: use de-identified values for subagent names ("customer ID," "customer hash," "sequence number"). Scrub attributes at the OTel collector via resource processors.

Risk 4: over-generalizing the postmortem lessons

[Author's view] Anthropic's three cases are Claude Code-specific. They do not translate one-to-one to your own production agent. In particular, "verbosity limit lowered code quality" is a phenomenon tied to Claude Code's specific system-prompt structure; Sales Claw has an independent approachGuardrails judge model, so the dynamics differ.

Mitigation: read the postmortem and map "what similar patterns exist in our system" case by case (do not swallow it as a universal principle).

Impact-probability matrix of four major risks from the 2026-05 coding-AI infrastructure era. Y-axis is impact (low → high), x-axis is probability (low → high). Top right (high impact, high probability): 'Fast mode Opus 4.7 cost variance'. Bottom right: 'Codex mobile BYOD conflict'. Top left: 'OTel agent_id PII risk'. Center: 'over-generalizing postmortem lessons'. Each risk is labeled with a numbered mitigation.
Figure: Figure 5: impact-probability matrix of four major risks (Python chart)

8. What this changes for Sales Claw — implications for AI sales automation

Implication 1: audit Sales Claw's system prompts

[Author's view] Sales Claw runs three categories of system prompt internally:

  1. Form-body generation prompt — combines sales NG detection with CAPTCHA-aware logic
  2. approachGuardrails judge prompt — an independent model that says "is this body safe to send?"
  3. Audit-log summarization prompt — turns the send log into an executive report

We are going to grep all three for "be concise"-type instructions in light of the Anthropic postmortem. Anything that touches reasoning will be rephrased as a final-output-format instruction.

Implication 2: add agent_id to Sales Claw OTel spans

[Author's view] Sales Claw also runs parallel subagents (form-submission worker, approachGuardrails judge, audit-log writer). Following Claude Code's lead, we will add agent_id / parent_agent_id to our OTel spans. That makes the following visible:

  • "Latency distribution of parallel subagent execution" in Datadog APM
  • "Customers where approachGuardrails judge runs slow" become identifiable
  • "Trace parent-child relationships at the moment the form-submission worker stops on CAPTCHA detection" become reconstructible

This is planned for Sales Claw's next minor release.

Implication 3: design UI for "non-touching" time

[Author's view] Aligning with Codex mobile and Claude Code background sessions, we are promoting Sales Claw's "overnight auto-run + morning approval" flow to an official operation pattern:

  • At midnight, Sales Claw prepares the next day's candidate list
  • At 9 AM, Slack notification: "today queue: 47, approachGuardrails all pass, approve?"
  • Approval is a Slack button (PC or phone); reject with a reason
  • Sending completes within an hour of approval; results returned to the Slack thread

The goal is "human-touch time per day is 5 minutes (approval only)." This aligns Sales Claw with the three coding-AI vendors' direction and codifies a standard "infrastructure-era" UI for AI sales automation.

Feature matrix comparing three coding-AI vendors (Claude Code / OpenAI Codex / Cursor) and Sales Claw. Y-axis is feature category, x-axis is the four tools. Comparison items: background sessions, mobile remote control, OTel agent_id, plugin dependency resolution, postmortem transparency, SLA publication, built-in approachGuardrails, self-hostable. Each cell labeled ✓ (supported), ▲ (partial), × (unsupported) with official / author's-view justification.
Figure: Figure 6: feature matrix of three coding-AI vendors + Sales Claw (Python chart)

Bringing Claude Code 2.1.145, Codex mobile, and Anthropic's postmortem into your operations

無料・MIT ライセンス。インストールせずにライブデモも試せます。


Reading the Japanese-language original? こちら.

よくある質問

The third week of May 2026 in coding AI in one sentence?
"The week coding AI moved from being a tool to being daily-use infrastructure." Three back-to-back events: (1) Claude Code 2.1.145 (Fast mode defaults to Opus 4.7, claude agents --json, OTel spans gained agent_id / parent_agent_id, 50+ improvements); (2) OpenAI Codex mobile (iPhone / iPad / Android remote-control a Mac Codex via QR pairing, 4M weekly active users); (3) Anthropic's "six-week quality regression" postmortem keeps getting re-discussed — three product changes (reasoning effort high→medium, caching bug, verbosity limit) chained together, all fixed in v2.1.116. The shared thread: the responsibility that comes from being used every day. Six actions for absorbing this week: upgrade, validate Fast mode, use claude agents --json, pair Codex mobile, export OTel agent_id, audit your system prompts for verbosity.
What changed in Claude Code 2.1.145?
2.1.145 was released ~6 hours before this post with 20+ new features plus bug fixes. The operationally important ones: (1) claude agents --json lists live sessions as JSON (tmux-resurrect, status bar, session pickers); (2) claude_code.tool OTEL spans gained agent_id and parent_agent_id, so parallel subagent execution finally renders correctly in Datadog / Honeycomb; (3) the /plugin Discover/Browse preview shows a plugin's commands / agents / skills / hooks / MCP/LSP servers before installation; (4) a permission-prompt bypass — "bare variable assignments to non-allowlisted environment variables in Bash were auto-approved" — has been fixed; (5) Stop/SubagentStop hook input added background_tasks and session_crons fields; (6) cross-project resume hint now works on Windows PowerShell 5.1 (using ";" as the command separator); (7) Read tool returns a truncated "PARTIAL view" notice instead of a hard error when a whole-file read exceeds token limits, plus 50+ smaller improvements. Cumulative 2.1.142 → 145 is 145+ improvements.
Fast mode now defaults to Opus 4.7 — what changes?
In 2.1.142, the Fast-mode default model switched from Opus 4.6 to Opus 4.7. Opus 4.7 is Anthropic's latest flagship (released 2026-04-16) with a 13% lift over Opus 4.6 on coding benchmarks (93-task benchmark; four tasks neither Opus 4.6 nor Sonnet 4.6 solved were solved by Opus 4.7). The "Fast mode" label stays — only the underlying model changed silently. The signal: Anthropic believes Opus 4.7's latency and unit economics are good enough for Fast mode. The escape hatch CLAUDE_CODE_OPUS_4_6_FAST_MODE_OVERRIDE=1 is available if quality regression is perceived. Sales Claw users should re-benchmark heavy-use paths (form-body generation, approachGuardrails judging) under Opus 4.7 over a week and compare quality + cost.
What can Codex mobile do? How does iPhone-to-Mac pairing work?
Codex mobile (announced 2026-05-14 by OpenAI) lets the ChatGPT app on iPhone / iPad / Android remote-control a Mac Codex session. Concretely you can: (1) browse all live Codex threads on your Mac from your phone, (2) review screen updates / diffs / test results / terminal output in real time, (3) approve or reject Codex command requests from your phone, (4) switch between GPT-5.5 / GPT-5.5 Mini / o4-mini remotely, (5) dispatch new tasks while out. The mechanism: Codex CLI on the Mac shows a QR code, you scan it with the ChatGPT app, pairing done — a simple OAuth-style flow. Security model: files, credentials, permissions, and local setup stay on the Mac; only screen updates, diffs, test results, and approval requests flow to the phone (encrypted in transit). Same idea as Anthropic's Claude Code Remote Control (2026-02). Currently in preview on iOS and Android only (Windows "coming soon"). Codex has 4M weekly active users (OpenAI).
What is in Anthropic's six-week quality postmortem?
Anthropic's Engineering Blog (2026-04-23) traced a six-week Claude Code quality regression to three product changes. (1) 2026-03-04: default reasoning effort was switched from high to medium to mitigate UI-freeze-style latency; Anthropic itself called this "the wrong tradeoff," reverted 2026-04-07 (Opus 4.7 → xhigh, others → high). (2) 2026-03-26: a "clear thinking history once for sessions idle 1+ hour" change had a logic bug that cleared it every turn afterward, leading to "Claude felt forgetful, usage limits drained fast"; affected Sonnet 4.6 + Opus 4.6; fixed 2026-04-10 in v2.1.101. (3) 2026-04-16: a system-prompt verbosity limit ("≤25 words between tool calls, ≤100 words for final responses") caused a 3% code-quality drop across Opus 4.6 / 4.7 in broader tests; affected Sonnet 4.6 + Opus 4.6 + Opus 4.7; reverted 2026-04-20. All three fixed in v2.1.116 (2026-04-20); usage limits reset for all subscribers. Process changes: Expanded Code Review eval, stricter system-prompt controls, broader per-model evaluations, and gradual rollout protocols.
Why is OTel agent_id / parent_agent_id valuable?
Claude Code 2.1.145 added agent_id and parent_agent_id to claude_code.tool OTel spans. Before this, "which subagent did which call?" was not recoverable from traces when subagents ran in parallel. With these attributes, parent-child relationships can be reconstructed and visualized in Datadog / Honeycomb / New Relic. Concrete benefits: (1) "latency distribution for parallel subagent execution" becomes visible in Datadog APM, (2) per-agent token consumption is aggregable, (3) when a subagent fails, the trace can be reconstructed under the dispatching Agent tool span, (4) enterprise observability tools officially fit Claude Code. Setup: define OTEL_EXPORTER_OTLP_ENDPOINT and start claude (compatible with standard Datadog Agent / Honeycomb OTel export configuration). Sales Claw will mirror this on its own OTel spans in the next minor release.
What can we learn from Anthropic's postmortem if we run our own AI agent?
Three lessons. (1) "Be concise" instructions in system prompts have wide blast radius: "shorter is better" instructions can leak into reasoning expression and degrade code generation (Anthropic 4/16). Sales Claw's form-body prompts, approachGuardrails judge, and audit-log summarization should be grepped for "be concise"-style instructions and rephrased as final-output-format instructions. Distinction: instructions that touch reasoning are bad; instructions about output format are fine. (2) Cache optimization carries production-only bugs: "run once vs. persist" branching is famously hard to unit-test. Sales Claw equivalents include "save cookie on first try, reuse afterward" logic that ends up re-logging-in every time. (3) The latency-vs-quality tradeoff is easily mis-judged: "initial render is slow → reduce verbosity" is the wrong frame; perceived quality regression is more damaging. Latency belongs in different layers (streaming UI, parallel execution, cache tiers). Adopting SRE process (canary → gradual rollout / per-model evaluations / transparent postmortems) prevents recurrence in your own product.
Impact on autonomous agents like Sales Claw?
Three categories. (1) Audit our system prompts for verbosity: to avoid Anthropic's 4/16 trap, grep Sales Claw's form-body generation prompt, approachGuardrails judge prompt, and audit-log summarization prompt for "be concise" instructions ("output format" instructions are OK; "touching reasoning" is not). (2) Add OTel agent_id to Sales Claw spans: parallel subagent execution (form-submission worker, approachGuardrails judge, audit-log writer) finally visualizes correctly in Datadog. Planned for the next minor release. (3) Design UI for "non-touching" time: aligned with Codex mobile + Claude Code background sessions, promote Sales Claw's "midnight prep → 9 AM Slack approval → completion within 1 hour" flow to an official operational pattern. Internal measurement: 1 SDR completing 30-50 form submissions/day in 8:21 of operator time (5 days × 1 SDR, 92.4% approval rate). Sales Claw uses a policy-gated autonomous design that lowers misdelivery and TOS-violation risk through pre-send automated checks, sales-NG detection, CAPTCHA-aware halts, send-rate limits, audit logging, and auto-stop conditions.

参考文献

本記事は X 公式アカウントと公式ドキュメントを一次情報として参照しています。

  1. [01]
  2. [02]
  3. [03]
  4. [04]
  5. [05]
  6. [06]
  7. [07]
  8. [08]
  9. [09]
  10. [10]

この記事の著者

中澤 圭志

中澤 圭志

Sales Claw maintainer

Designs and develops Sales Claw. Writes from the field on B2B sales automation and applied AI.

Share this article