AI NewsClaude Opus 4.8

Claude Opus 4.8 Became an "Honest AI" — Benchmarks, New Features, and Cost, Explained for Non-Experts

On May 28, 2026, Anthropic released its flagship model Claude Opus 4.8 — less than two months after 4.7. The real story this time is not raw intelligence but "honesty": the chance it silently lets flaws in its own code through dropped to roughly a quarter. This article walks general readers through the actual numbers across six official benchmarks, the cost trap hiding behind "flat pricing," the three new features (Fast Mode / Dynamic Workflows / Effort Control), and how individuals, SMBs, and enterprises should get started — framed with the metaphor of a new hire growing into a senior colleague.

中澤圭志

@keishi_nakazawa

Sales Claw maintainer

May 29, 2026·15 min

Claude Opus 4.8 Became an "Honest AI" — Benchmarks, New Features, and Cost, Explained for Non-Experts

This English article is a concise version of the original. For the full Japanese deep-dive, see the Japanese original.

Key Facts

Release date

2026-05-28 (less than two months after Opus 4.7)

Biggest leap

honesty — chance of missing code flaws dropped to about one quarter

Knowledge work (GDPval-AA)

1890, top of all models (121 points ahead of #2 GPT-5.5)

Pricing

$5 input / $25 output (per 1M tokens), unchanged from 4.7

In one sentence
On May 28, 2026, Anthropic (the AI company behind "Claude," a rival to ChatGPT) released its new flagship model, Claude Opus 4.8 — barely two months after Opus 4.7. There are two headline changes. First, the model "honestly flags the rough edges in its own work" (it is roughly 4× less likely to silently let a flaw in its own code slip through). Second, "the price stayed flat while the intelligence went up." On top of that, a Fast Mode that is 3× cheaper than the previous fast mode, "Dynamic Workflows" that run hundreds of small AIs at once, and "Effort Control" that dials how hard the AI works all arrived. For readers who are not deep in AI, we lay out what got better and how it matters to ordinary people and small businesses, using the metaphor of a new hire and a senior colleague.

Bottom line: The single biggest leap in Opus 4.8 is not its intelligence (benchmark scores) but its "honesty." Previous AIs would often say "Done!" with total confidence while actually being wrong. Opus 4.8 now volunteers "I'm not confident here" or "please double-check this part" on its own, and according to Anthropic it is roughly 4× less likely to silently pass off a flaw in code it wrote. This pays off precisely in the situations where a human cannot check every single output (overnight automation, bulk form submission, and the like). At the same time, you still need to remember that "became honest" is not the same as "became perfect," and that misreading the pricing model can make costs balloon.

"How is Opus 4.8 different from 4.7?" "What does an ‘honest AI’ even mean?" "Does this matter to a small business like mine?" — this article answers those three questions about Claude Opus 4.8, announced on May 28, 2026, drawing on Anthropic's official "Introducing Claude Opus 4.8" and the independent benchmark house Artificial Analysis's GDPval-AA leaderboard as primary sources, from the perspective of a Sales Claw maintainer.

As an aside: the AI writing this article is itself Claude Opus 4.8.The human behind it (Nakazawa) said "I'm going to write an article about you" and had it do the work. Writing about oneself objectively is a little awkward, but we'll proceed without exaggeration, using only numbers that can be officially verified.

Companion reads: Anthropic overtakes OpenAI for the first time — industry trend roundup, OpenAI Codex's Goal Mode reaches general availability, and Claude Compliance API and 28 integration partners explained.

This article uses Anthropic's official "Introducing Claude Opus 4.8" / Anthropic Models official docs / Anthropic Pricing official docs as primary sources. The benchmark comparison uses Anthropic's official table and the Artificial Analysis leaderboard. Sales Claw's free download page is also available.

1. What Claude Opus 4.8 is — an "honest new hire" became an "honest senior"

First, the basics: Claude is a conversational AI built by Anthropic. It is a rival to ChatGPT (OpenAI) and Gemini (Google). Claude comes in three sizes, ordered by intelligence and price: Opus (top tier) > Sonnet (mid tier) > Haiku (lightweight). What got updated this time is the one at the top, Opus.

Let's translate three important terms into plain language up front.

Jargon	In plain words	Everyday analogy
Benchmark	A shared test that measures how smart an AI is	A mock-exam score — everyone solves the same problems and gets ranked
Honesty (alignment)	How much the AI avoids "pretending it's done when it isn't"	A new hire who doesn't lie "Finished!" and instead says "I'm not sure here"
Agentic	The AI uses tools on its own and runs multiple steps by itself	A secretary who, given one instruction, does the research and the execution

[Official] Anthropic describes Opus 4.8 as having "sharper judgement, more honesty about its progress, and the ability to work independently for longer than its predecessors." Early testers report that it is "more likely to flag uncertainty about its own work and less likely to make unsupported claims," and that it is roughly one-quarter as likely as its predecessor to silently pass off a flaw in code it wrote.

[Author's view] In a Sales Claw maintainer's terms, this is the change from "a new hire who just says ‘done’" into "a senior who can properly say ‘I don't know’ about the parts they don't know." The scariest thing about putting AI into your workflow is not that it's unintelligent — it's that it's confidently wrong.In overnight jobs and bulk sends where a human cannot check everything, this "honesty" becomes the last line of defense against accidents.

2. Opus 4.8's ability in official benchmarks — six categories, real numbers

Anthropic's official Claude Opus 4.8 benchmark comparison table. Agentic coding (SWE-Bench Pro): Opus 4.8 69.2%, Opus 4.7 64.3%, GPT-5.5 58.6%, Gemini 3.1 Pro 54.2%. Computer use (OSWorld-Verified) 83.4%, knowledge work (GDPval-AA) 1890 — beating the predecessor in every category. Only on terminal coding does GPT-5.5 lead at 78.2%. — Figure: Anthropic official benchmark comparison (source: Anthropic, "Introducing Claude Opus 4.8," 2026-05-28). Six-category comparison of Opus 4.8 / Opus 4.7 / GPT-5.5 / Gemini 3.1 Pro

The comparison table Anthropic published in its announcement is the image above. To make it readable for general audiences, we lay out the six categories together with "what each one measures."

Test (what it measures)	Opus 4.8	Opus 4.7	GPT-5.5	Gemini 3.1 Pro
Coding SWE-Bench Pro / real bug fixes	69.2%	64.3%	58.6%	54.2%
Terminal use Terminal-Bench 2.1 / the black screen	74.6%	66.1%	78.2%	70.3%
Cross-domain reasoning Humanity's Last Exam / with tools	57.9%	54.7%	52.2%	51.4%
Computer use OSWorld-Verified / clicking the screen	83.4%	82.8%	78.7%	76.2%
Knowledge work GDPval-AA / real-world tasks (Elo)	1890	1753	1769	1314
Financial analysis Finance Agent v2 / finance agent	53.9%	51.5%	51.8%	43.0%

[Official] The number to watch is knowledge work (GDPval-AA) at 1890.This is an Elo score (the same mechanism as chess ratings) in which AI agents perform "tasks close to real work" and human raters decide the winners. In the leaderboard below, Opus 4.8 stands a head above all other models at #1.

Artificial Analysis GDPval-AA leaderboard. Agentic performance on real-world work tasks, compared by Elo score. Claude Opus 4.8 (max) is #1 at 1890, followed by GPT-5.5 (xhigh) 1769, Claude Opus 4.7 (max) 1753, Claude Sonnet 4.6 (max) 1676, GPT-5.4 (xhigh) 1674, and Gemini 3.5 Flash 1656. — Figure: GDPval-AA leaderboard (source: Artificial Analysis). Elo scores for real-world work tasks using web and shell access. Opus 4.8 (max) leads at 1890, 121 points ahead of #2 GPT-5.5 (1769)

[Author's view] Two things stand out in this chart. First, Opus 4.8 is 121 points ahead of #2 GPT-5.5 (in Elo, a 100-point gap roughly means "wins about 64% of the time"). Second, it jumped 137 points from its predecessor Opus 4.7 (1753). Moving this much in just two months shows how fast the AI industry is evolving.

That said, to be honest, Opus 4.8 does not win everything. On terminal use (Terminal-Bench 2.1), GPT-5.5 leads at 78.2%, above Opus 4.8's 74.6%. The fair view is that for some use cases, a rival model may be the better fit.

A whiteboard explainer laying out Claude Opus 4.8's three pillars of evolution: 'honesty (about one-quarter the chance of missing a flaw),' 'intelligence (tops 5 of 6 categories),' and 'flat pricing (same price as 4.7),' connected by arrows with an illustration of an employee growing from new hire to senior. — Figure: Opus 4.8's evolution in three pillars. The essence this time is that honesty, intelligence, and flat pricing were achieved at once

3. What "the most honest model" means — the truth behind "4×"

Why does "honesty" matter so much? Anyone who has used AI for work has probably had this experience: you ask it to "make a document," and back comes a polished doc full of plausible-looking numbers and citations. But on closer inspection, those numbers and citations didn't exist— this is called "hallucination" (an AI fabricating something untrue in a believable way).

[Official] Anthropic positions Opus 4.8 as "the most honest model to date," and concretely states that "the chance it passes off a flaw in code it wrote, without flagging it, is roughly one-quarter that of its predecessor Opus 4.7."It further says that "the rate of misaligned behavior dropped significantly, and it reached a new high on indicators of prosocial traits."

[Author's view] Broken down from the standpoint of a business AI like Sales Claw, it means this. When you have an AI write 1,000 sales messages and a human cannot check all of them, the scary state is "90% are perfect, but 10% quietly contain factual errors — and the AI believes they're all perfect." An honest model raises a red flag on its own: "the information on this company is weakly substantiated; please verify." This is a modest but decisive evolution that turns AI use from "hand it everything" into "a human only looks at the dangerous parts."

That said, [Unverified] a caveat is needed too. "Became honest" does not mean "won't make mistakes anymore." It became more likely to flag mistakes; the mistakes themselves did not drop to zero. Anthropic itself only goes as far as "reduces risk," so if you use it in production, a final-check mechanism remains mandatory.

4. Three new features — Fast Mode / Dynamic Workflows / Effort Control

A whiteboard explainer of Opus 4.8's three new features: 'Fast Mode (2.5× faster, 3× cheaper),' 'Dynamic Workflows (spins up hundreds of small AIs at once),' and 'Effort Control (a dial that adjusts how hard it tries),' each shown with a speedometer, a squad, and a knob illustration. — Figure: Opus 4.8's three new features. Fast Mode is speed, Dynamic Workflows is parallelism, Effort Control is the effort dial

(1) Fast Mode — fast, and 3× cheaper than before

Fast Mode is, as the name says, a "return answers fast" mode. [Official] Opus 4.8's Fast Mode is said to run 2.5× faster than standard and to be 3× cheaper than the previous model's fast mode. It shines on impatient work and on handling large volumes of requests.

(2) Dynamic Workflows — run hundreds of small AIs at once

Dynamic Workflows is a mechanism that splits one big job among hundreds of small AIs (sub-agents) and runs them at the same time. [Official]Anthropic introduces this in Claude Code (the developer-facing command-line version of Claude) as a feature that "runs hundreds of sub-agents and aggregates the results."

In everyday terms, it's the difference between "having one excellent employee do everything" and "having 100 part-timers each check one square, then tallying at the end." For horizontally wide worklike "research each of 1,000 companies," the time drops dramatically.

(3) Effort Control — adjust the AI's "effort level"

Effort Control lets a human specify "how much effort do you want it to spend thinking about this task." [Official] In both browser Claude and Claude Code, you can set effort high for work you want it to think hard about, and low for work you want done quickly.

[Author's view] This is unglamorous but practical. Because for AI "trying hard = time and money," spending full effort on everything is wasteful. Being able to use "quick for drafts, thorough for the final check" is huge for cost management. For developers, system entries (a mechanism in the Messages API that lets you inject instructions mid-task without wasting the work done so far) were also added.

Sales Claw, too, follows the philosophy of 'stop honestly' and 'only flag the dangerous parts,' designing AI sales to run safely.

無料・MIT ライセンス。インストールせずにライブデモも試せます。

無料でダウンロードライブデモを試す GitHub

5. Getting started — three scenarios: individual, small business, enterprise

[Official] At launch, Opus 4.8 became usable via the API (model name claude-opus-4-8), the browser at claude.ai, and Claude Code. It is available on the Enterprise / Team / Max plans.

Individual (today)

The easiest path is for people already using Claude. Open claude.ai and pick Opus 4.8 from the model menu. If you're on a paid plan (such as Max), you can switch at no extra cost. The recommended first step is to "re-run your usual work with Opus 4.8 and feel the difference."

Small business (deploy in 1–2 weeks)

For 10–100 employees, sign up for the Team plan and distribute it to staff. Three things to do at rollout: (1) set usage limits per user, (2) decide rules by use case (which tasks are allowed), and (3) decide a deliverable-review flow.Don't treat it as "an honest model, so no review needed" — make it "an honest model, so a human looks at the parts where it raised a red flag."

Enterprise (roll out over 1–3 months)

For 500+ employees, roll out in phases on the Enterprise plan, involving IT, legal, and security teams. [Author's view] As the contemporaneous announcement of the Claude Compliance API and 28 integration partnersshows, the industry is moving to "put AI in the same management envelope as SaaS." The recommendation is to design audit logging, access control, and data-residency checks into the rollout from the start.

6. Pricing and cost estimate — the pitfall behind "flat pricing"

Official pricing (assumptions)

Standard mode: input $5 / output $25 (per million tokens) — unchanged from Opus 4.7
Fast Mode: input $10 / output $50 (per million tokens) — but 3× cheaper than the previous model's fast mode
A "token" is a chunk of text the AI reads and writes. In Japanese, roughly 1 character ≈ 1–2 tokens
Exchange rate: converted at about 1 USD = 150 JPY (variable)

A bar chart comparing Opus 4.8 and Opus 4.7 pricing. Standard mode is input $5 / output $25, identical across both generations (flat). Fast mode for Opus 4.8 is input $10 / output $50, shown to be 3× cheaper than the previous generation's fast mode (per million tokens, USD). — Figure: Standard pricing is flat vs. 4.7; Fast mode is 3× cheaper than the prior generation. But the total depends on token volume (estimate)

Why costs balloon even at flat pricing

[Author's view] This is the point general readers most easily misread. "Same unit price = same monthly bill" is not true. Because Opus 4.8 can "work autonomously for longer" and "run hundreds of parallel jobs," even at the same unit price, the total rises as token consumption (= amount of work) increases.

Indeed, the ride-hailing giant Uber, which deployed Claude company-wide, was reported to have burned through its 2026 AI budget in four months (its CTO admitted as much to The Information). Run hundreds of AIs with the new Dynamic Workflows, and the bill grows with the convenience. The countermeasure is to "build in usage limits at the same time you deploy." This is the iron rule of the autonomous-AI era, as covered in the previous Codex Goal Mode article.

項目	Misreading only the unit price	What actually happens
Unit price	Same as 4.7 ($5 / $25)	Same (this part is correct)
Work per task	Assumed unchanged	Tends to rise as it works autonomously for longer
Parallelism	Not considered	Hundreds of times with Dynamic Workflows
Monthly total	Assumed flat	Balloons depending on usage
Countermeasure	None in particular	Usage limits, effort tuning, auditing

7. Risks and caveats — the price of "honest and smart"

WARNING— Opus 4.8 risks to keep in mind at the time of writing

Honest ≠ perfect: it only became more likely to flag mistakes; mistakes didn't hit zero. A final check is still needed
Cost balloon: even at flat unit pricing, heavy autonomous and parallel use raises the total (the Uber case)
Over-delegating privileges: hand a smarter AI too much screen control or external sending, and unintended actions (mis-sends, mis-deletes) have a bigger impact
ToS and legal: for external-facing work like sales and support, you must verify compliance with Japan's Anti-Spam Act, Personal Information Protection Act, and Specified Commercial Transactions Act
Over-trust: the phrase "the most honest model" taking on a life of its own, drifting toward operations that skip human checks, is the most dangerous of all

Over-trust — "it's honest, so we can leave it alone" is the most dangerous

[Author's view] Ironically, the "honest model" tagline itself can become the biggest source of risk. When humans relax with "if it's honest, we can hand it everything," they miss even the few cases where the AI did raise a red flag. Honesty is "a map showing where a human should look," not "a free pass for a human not to look."

Legal — when using it for sales or support

When you use Opus 4.8 for external-facing work such as sales form submission or customer-support replies, in Japan you need to verify compliance with the following laws.

Anti-Spam Act (Specified Email Act): always include the four sender-information requirements (name, address, opt-out method, contact point) in the body
Personal Information Protection Act: state the purpose of use of personal data obtained, restrict third-party provision, and respond to individuals' disclosure requests
Specified Commercial Transactions Act: for mail-order or email advertising, no exaggerated claims and a mandatory opt-out path

[Author's view]Opus 4.8 is just a tool, so this legal compliance is the responsibility of the user. For enterprise use in particular, the design must extend all the way to a system that can later trace who sent what and when. Even as the model becomes smart and honest, the responsibility for "what may be sent" remains with humans and operational design.

A bar chart comparing the 'code-flaw miss rate' of Opus 4.7 and Opus 4.8. It visualizes that Opus 4.8 dropped to about one-quarter of its predecessor, with an annotation that 'honesty prevents accidents but does not make them zero,' depicting the residual risk (a conceptual figure based on the official explanation). — Figure: Opus 4.8 cut code-flaw misses to about 1/4 of the prior generation. But the fact that it's not 'zero' is the operational caveat (conceptual figure based on the official explanation)

8. Sales Claw perspective — building an "AI that stops honestly" into your workflow

From here on, as a Sales Claw maintainer, I'll write from an implementation standpoint about what's needed to actually build Opus 4.8 into sales work.

Sales Claw is an OSS tool designed to lower mis-send and ToS-violation risk through policy control, pre-send automated inspection, sales-NG detection, stop-on-CAPTCHA, send-rate limits, audit-log retention, and auto-stop conditions. Opus 4.8 "raising a red flag honestly" and Sales Claw "automatically stopping dangerous sends" are the two wheels of the same "don't hand it everything" philosophy.

DATA— Internal validation memo — Sales Claw v0.5 autonomous-run data (disclosed for general readers)

Conditions: Windows 11 / Node.js 22 / Playwright 1.49 / Sales Claw v0.5 / Claude Opus + Sonnet hybrid
Period: past 90 days (2026-02-26 to 2026-05-26)
Sample: 3,247 form submissions / 412 CAPTCHA encounters / 47 sales-NG patterns detected
Observations:
1. Overnight (23:00–7:00 JST) autonomous runs averaged 4.2 hours per session, stable at a turn cap of 800
2. API cost: about ¥12,500/month total (made efficient via the hybrid model)
3. CAPTCHA encounter rate 12.7% (412 / 3,247); on detection it sets awaiting_approval, retains the audit log, and auto-stops
4. Sales-NG detection: an internal LLM judges phrasing like "no sales solicitations"; 8 false positives
5. Zero mis-sends is guaranteed by the three-piece set of "pre-send automated inspection," "auto-stop conditions," and "audit-log retention"
Limits of reproducibility: 3,247 is a small sample, biased by industry and target-company size, so industry and seasonal differences need separate validation

[personal_metric] I (Nakazawa) rewrote Sales Claw's autonomous-loop design three times in the past 90 days. The first had no stop condition and ran away; the second specified only a count and overran on time; the third finally stabilized with an AND of "count × time × turn cap." However honest and smart Opus 4.8 gets, my conviction that "designing when to stop is a human's job"hasn't changed.

How to leverage Opus 4.8's honesty in your work

If Opus 4.8 will tell you "I'm not confident here," it's only sensible to build a mechanism on the operations side that catches that red flag and stops automatically. In Sales Claw, a send the AI is unsure about is set to awaiting_approval, kept in the audit log, and never sent automatically. Only the combination of an "honest model" plus "operations that receive the honesty" reduces accidents.

Three safety mechanisms (AND of count × time × turn cap / audit log / pre-send inspection)

Always apply multiple stop conditions with AND to an autonomous loop, always record the results of autonomous runs chronologically in the audit log (action-log.json), and put external sends through pre-send automated inspection.Whether the model goes from Opus 4.7 to 4.8, the importance of this three-piece set doesn't change. If anything — because it can now work "longer and more in parallel" — designing the stop and the record matters more than ever.

A release-timeline figure for the Claude Opus series. It lines up the release timing of Opus 4.5, 4.6, 4.7, and 4.8 chronologically, with an arrow emphasizing that 4.7 to 4.8 was less than two months, showing the accelerating model-update cycle. — Figure: The release intervals of Claude Opus. 4.7 → 4.8 was less than two months, and the update cycle is accelerating (timeline based on public information)

9. Conclusion — how to bring an "honest, smart colleague" into daily judgment

Claude Opus 4.8, which arrived on May 28, 2026, is a milestone model that foregrounds not just "intelligence" but "honesty." It beats rivals and its predecessor in 5 of 6 categories and records the top score among all models in knowledge work at 1890. Pricing stayed flat, Fast Mode is 3× cheaper, and the new ways of working — Dynamic Workflows and Effort Control — were added.

At the same time, even an "honest, smart AI" will cause accidents if you hand it everything.Honesty is "a map showing where a human should look," not a free pass. The total balloons even at flat unit pricing. The smarter it gets, the more the design of scope and stop conditions matters — that's my candid read as a Sales Claw maintainer.

A checklist before putting Opus 4.8 into your work

Before putting Opus 4.8 into production

You can access Opus 4.8 via one of claude.ai / API / Claude Code
You re-ran your usual work with Opus 4.8 and felt the difference from 4.7
You set per-user / per-project usage limits
You decided "low for drafts, high for final checks" with Effort Control
If using Dynamic Workflows, you set a cap on parallelism and cost
You decided an operational flow where a human reviews outputs the AI flagged
You decided where to store audit logs (local / existing management platform)
For sales or support use, you verified compliance with the Specified Commercial Transactions Act and the Personal Information Protection Act
You documented the stop procedure for incidents (who stops it, and how)
You eliminated the complacency of "it's an honest model, so no review needed"

Next action: first switch the model to Opus 4.8 at claude.ai and have it do one of your usual tasks to feel the difference. If you want to bring AI into sales or support work, you can start from the quick-start guide of a business-focused OSS like Sales Claw.

As a follow-up to this article, see also Anthropic overtakes OpenAI for the first time — industry trend roundup.

Once you've read this, give Opus 4.8 five minutes of hands-on time. That's the shortest route to bringing AI into your work.

無料・MIT ライセンス。インストールせずにライブデモも試せます。

無料でダウンロードライブデモを試す GitHub

よくある質問

In one sentence, what makes Claude Opus 4.8 impressive?

[Official] It is the flagship AI model Anthropic released on May 28, 2026, less than two months after Opus 4.7. Its headline trait is not intelligence but "honesty": Anthropic positions it as "the most honest model yet," explaining concretely that "the chance it passes flaws in its own code without flagging them is about a quarter of Opus 4.7." On intelligence it also beats rivals and its predecessor on five of six official benchmarks, and notably tops every model on the real-task knowledge-work index (GDPval-AA) at 1890. Pricing stays flat from 4.7 at $5 input / $25 output (per 1M tokens) — only intelligence and honesty went up, which is the real essence of this release.

What does the "4x" in "the most honest model" actually mean?

[Official] Anthropic says that for Opus 4.8, "the chance it passes flaws in its own code without flagging them is about one quarter of Opus 4.7 (i.e., about 4x fewer misses)." In plain terms: the AI is now more likely to raise its own red flag for the parts it cannot actually do or is not confident about, instead of quietly letting them through. [Author's view] The scariest thing about putting AI into operations is not that it lacks intelligence — it is being "confidently wrong." In situations a human cannot fully review, such as overnight processing or high-volume sending, this honesty becomes the last line of defense against accidents. But "more honest" does not mean "never wrong" — it only means mistakes are easier to surface, so a final-review mechanism remains mandatory for business use.

If pricing is flat, is my monthly cost the same too?

[Author's view] This is where general readers most often get misled. "Same unit price" does not mean "same monthly bill." Standard mode stays flat from 4.7 at $5 input / $25 output (per 1M tokens), but because Opus 4.8 "works autonomously for longer" and "can run hundreds of tasks in parallel," the total can rise if the volume of tokens consumed (i.e., the amount of work) grows even at the same unit price. In fact, Uber, which deployed Claude company-wide, was reported to have burned through its 2026 AI budget in four months. The countermeasures: "build in usage caps at deployment time" and "tune effort with Effort Control." The safe approach is to measure on a small sample before a production rollout.

What are the three new features in Opus 4.8?

[Official] (1) Fast Mode — a high-speed mode that runs 2.5x faster than standard and is 3x cheaper than the previous model's fast mode. (2) Dynamic Workflows — a mechanism that splits one big job across hundreds of small AIs (subagents) running in parallel, introduced in Claude Code as "spin up hundreds of subagents and consolidate the results." (3) Effort Control — lets a human specify "how much effort and thought you want spent on this task," dialing effort high for work that needs careful thought and low for work you want done quickly. For developers, Messages API system entries were also added (a mechanism to inject instructions mid-process without wasting the work done so far).

Where can I use Opus 4.8, and how should I get started?

[Official] Opus 4.8 became available at launch via the API (model name claude-opus-4-8), the browser version claude.ai, and Claude Code, on the Enterprise / Team / Max plans. Individuals can use it today simply by opening claude.ai and selecting Opus 4.8 from the model menu. SMBs should sign up for the Team plan and deploy after deciding three things: usage caps, per-use-case rules, and an output-review flow. Enterprises should use the Enterprise plan, involve IT, legal, and security, and build audit logs, permission management, and cross-border data checks into the design from the start, rolling out in phases. The fastest route is to "re-run your usual task with Opus 4.8 and feel the difference."

We're an SMB — is it safe to use this for sales or support?

[Author's view] As a tool it is more than capable, but if you use it for external-contact work (sales-form submissions, customer-support replies, etc.), compliance with Japanese law becomes the user's responsibility. Specifically you need to check (1) the Act on Regulation of Transmission of Specified Electronic Mail (the four sender-information requirements must always be in the body), (2) the Act on the Protection of Personal Information (state the purpose of use, restrict third-party provision, handle disclosure requests), and (3) the Act on Specified Commercial Transactions (no exaggerated claims, opt-out path mandatory). However smart and honest the model becomes, the responsibility for "what may be sent" stays with humans and operational design. A work-specific OSS like Sales Claw adopts a design that "lets AI run while preventing accidents" — pre-send automatic inspection, sales-NG detection, stop-on-CAPTCHA, send-rate limiting, audit-log retention, and auto-stop conditions — making it a good fit for Opus 4.8's "honesty."

参考文献

本記事は X 公式アカウントと公式ドキュメントを一次情報として参照しています。

[01]
Anthropic — Introducing Claude Opus 4.82026-05-28
[02]
Anthropic Docs — Models overview2026-05-28
[03]
Anthropic Docs — Pricing2026-05-28
[04]
Anthropic — Newsroom2026-05-28
[05]
Anthropic official X (@AnthropicAI)@AnthropicAI·2026-05-28

この記事の著者