AI NewsCursor

Can Cursor Composer 2.5 Match Opus 4.7 / GPT-5.5? — Kimi K2.5 + 25x RL at 1/10 the Cost

Composer 2.5 (2026-05-18) takes the same Moonshot Kimi K2.5 checkpoint as Composer 2, scales synthetic RL tasks 25x, and lands on par with Claude Opus 4.7 / GPT-5.5 on SWE-Bench Multilingual at roughly 1/10 the cost. We pull from Cursor's official blog, changelog, and forum to cover benchmarks, architecture, pricing strategy, long-horizon improvements, the no-public-API risk, and Sales Claw's read on targeted RL.

中澤圭志

@keishi_nakazawa

Sales Claw maintainer

May 18, 2026·11 min

Can Cursor Composer 2.5 Match Opus 4.7 / GPT-5.5? — Kimi K2.5 + 25x RL at 1/10 the Cost

This English article is a concise version of the original. For the full Japanese deep-dive, see the Japanese original.

Key Facts

Release date

2026-05-18 (US time, Cursor official blog / changelog)

Base checkpoint

Moonshot Kimi K2.5 + 25x synthetic RL tasks

Pricing (per 1M tokens)

Standard $0.50 / $2.50, Fast $3.00 / $15.00

SWE-Bench Multilingual

79.8% (Opus 4.7 80.5% / GPT-5.5 77.8%)

"Cursor Composer 2.5 just shipped. The post says it lines up with Opus 4.7 and GPT-5.5, but can a vendor-built model really hold its own against the frontier? What does it mean architecturally to take the same Kimi K2.5 base and 25x the synthetic RL workload? And what about the pricing?"—— This article walks through Composer 2.5 as a long-horizon coding agent model, using Cursor's official blog, changelog, and forum as primary sources. We write from the seat of a team that builds self-running loops where dozens of tool calls chain together (Sales Claw), and we focus on what changed and where the traps are.

On May 18, 2026 (US time), Cursor released the latest revision of its in-house coding model, Composer 2.5. Like the previous Composer 2 (shipped 2025-10 alongside Cursor 2.0), Composer 2.5 starts from the Moonshot Kimi K2.5 checkpoint and adds continued pretraining plus RL on top — but the volume of synthetic RL tasks is now 25x larger, and the optimizer stack is upgraded to Sharded Muon with distributed orthogonalization to keep the RL run scalable.

This article uses the Cursor official blog ("Introducing Composer 2.5", 2026-05-18), the Cursor changelog (Composer 2.5), the official Cursor forum announcement, and the official Cursor docs (Models) as primary sources. Third-party benches and individual X posts are treated as auxiliary reading and are deliberately kept out of the JSON-LD citations.

1. What is Composer 2.5 — release overview

[Official] The Cursor official blog (2026-05-18) frames Composer 2.5 as "our most powerful model yet", claiming a "substantial improvement in intelligence and behaviour" over Composer 2. At release, Composer 2.5 became the default model inside the Cursor app and is also adopted as the default in Cloud Agents (the background-agent execution platform).

[Official] The Composer lineage is three generations deep:

Composer 1(bundled with Cursor 2.0, 2025-10): Cursor's first in-house coding agent model, oriented toward incremental in-IDE editing.
Composer 2 (Cursor blog "Composer 2", 2025-10): Continued pretraining + RL on top of the Moonshot Kimi K2.5 checkpoint. Re-designed for agentic use.
Composer 2.5 (2026-05-18): Same checkpoint and price band as Composer 2, but scaled with 25x synthetic RL tasks + Sharded Muon.

[Author's view] Read across the lineage and a clear strategy emerges: "hold the base checkpoint constant and attack via RL volume and quality."Cursor isn't chasing the frontier labs by training a 100B+ dense model from scratch — instead, it starts from an open Moonshot checkpoint and pushes the model toward the actual task distribution observed inside Cursor itself via RL.

2. Benchmarks — how it lines up with Opus 4.7 and GPT-5.5

[Official] Both the Cursor blog and changelog report Composer 2.5 benches on three axes, each compared against Opus 4.7 and GPT-5.5. Third-party outlets (the-decoder, officechai) cite the same numbers.

Bar chart comparing Composer 2.5, Opus 4.7, and GPT-5.5 across three benches: SWE-Bench Multilingual, Terminal-Bench 2.0, and CursorBench v3.1. Numbers: 79.8% / 69.3% / 63.2% for Composer 2.5, 80.5% / 69.4% / 64.8% for Opus 4.7 max, 77.8% / 82.7% / 59.2% for GPT-5.5. GPT-5.5 leads on Terminal; Composer 2.5 ties Opus 4.7 on Multilingual and CursorBench. — Figure: Figure 1: Composer 2.5 vs Opus 4.7 vs GPT-5.5 across three benchmarks. Multilingual / Terminal / CursorBench give different winners.

SWE-Bench Multilingual — 0.7pt off the frontier

[Official] On SWE-Bench Multilingual (the official benchmark for cross-language real-repo patch generation), Composer 2.5 scores 79.8%, just 0.7pt below Opus 4.7 (80.5%) and 2pt above GPT-5.5 (77.8%). The chart is in the Cursor blog, and third-party outlets quote the same figures.

[Author's view] The meaningful fact here is simply that "a relatively small in-house model lands within one point of the frontier labs' full-power models." For coding workloads, the choice is no longer "giant general-purpose LLM only" — a thick layer of domain-specific RL on top of a specialised model is now a realistic option.

Terminal-Bench 2.0 — GPT-5.5 leads, Composer 2.5 ties Opus 4.7

[Official] On Terminal-Bench 2.0, Composer 2.5 hits 69.3%, Opus 4.7 hits 69.4%, and GPT-5.5 pulls ahead at 82.7%. Terminal tasks here combine bash operations, package installs, and dependency resolution — areas where GPT-5.5's terminal reinforcement clearly shows.

[Author's view] The differentiator on Terminal-Bench is "reading subtle shell error messages and recovering from them." Cursor optimises Composer 2.5 for IDE-resident agent tasks, so it makes sense that GPT-5.5 is still ahead on pure terminal recovery loops.

CursorBench v3.1 — narrowly beats Opus 4.7 in places

[Official] On Cursor's in-house CursorBench v3.1 (synthetic agent tasks composed from real Cursor usage logs), Composer 2.5 scores 63.2%. That beats Opus 4.7 xhigh (61.6%) and lands within 1.6pt of Opus 4.7 max (64.8%). GPT-5.5 trails at 59.2%.

[Author's view] In-house benches are structurally biased toward in-house models, so claiming "we beat Opus 4.7" on the back of CursorBench is risky. When you cite CursorBench, always attach the qualifier "within the Cursor usage distribution."

項目	Composer 2.5 (in-house, relatively small)	Opus 4.7 / GPT-5.5 (frontier)
SWE-Bench Multilingual	79.8%	Opus 80.5% / GPT-5.5 77.8%
Terminal-Bench 2.0	69.3%	Opus 69.4% / GPT-5.5 82.7%
CursorBench v3.1	63.2%	Opus max 64.8% / GPT-5.5 59.2%
Cost per task (CursorBench, official table)	< $1	Frontier models up to $11

Hand-drawn whiteboard showing Composer 2.5, Opus 4.7, and GPT-5.5 compared across three benchmarks in a three-lane layout. SWE-Bench Multilingual is near-tied, Terminal-Bench shows GPT-5.5 in the lead, and CursorBench carries a home-court bias warning sticky note — Figure: Whiteboard overview: 3-axis benchmark at a glance — Multilingual is a draw, Terminal goes to GPT-5.5, CursorBench has home-court bias

3. Architecture — Kimi K2.5 + Sharded Muon + targeted RL

[Official] The Cursor blog lists three architectural ingredients:

Moonshot Kimi K2.5 baseline — continued pretraining from the open checkpoint (same foundation as Composer 2)
Sharded Muon optimizer + distributed orthogonalization — the optimizer stack that lets the RL run stay stable across many GPUs
Targeted RL with textual feedback — RL design that uses natural-language feedback (not just scalar reward) to shape model behaviour

[Author's view] What you can read out of this stack is a clear strategic choice: "earn capability through RL quality and quantity, not by carrying giant weights." Instead of a 100B+ dense frontier model, Cursor takes Kimi K2.5 (a public MoE checkpoint) and pumps in a huge volume of synthetic agentic-task RL to fit the "tasks that actually happen inside Cursor." This is exactly the direction a domain-specialised agent like Sales Claw could take.

What "25x synthetic RL tasks" actually means

[Official] Cursor is explicit that Composer 2.5 has 25x the volume of synthetic RL tasks compared to Composer 2. The absolute number is not disclosed, but if Composer 2's RL used tens to hundreds of thousands of tasks, Composer 2.5 is plausibly into the millions ([Speculation] — no official absolute is provided).

[Official] Sharded Muon optimizer with distributed orthogonalization is essentially the scaling toolkit for the Muon optimizer (proposed by Keller Jordan and others in 2024; a second-order optimizer related to SOAP). It addresses the update instability that becomes practically unavoidable when you scale RL this aggressively.

Targeted RL with textual feedback

[Official] Rather than scalar reward alone, Cursor uses textual feedback (natural-language feedback) to drive targeted RL. The exact technique (RLAIF-style? Constitutional AI-style?) is not specified, so it remains [Unverified]— but the blog explicitly describes the design as "surgically" fixing specific failure modes (e.g. "abandoning a tool-call loop too early" or "losing consistency across multi-file edits").

4. Long-horizon and tool calling — surviving hundreds of chained calls

[Official] The Cursor blog is explicit that Composer 2.5 improves on "sustained work on long-running tasks." Concretely, the success rate on long-horizon tasks involving hundreds of tool calls is now higher than the previous model, and sampling-temperature stability has improved enough to make Cloud Agents practical, per the post.

Flow diagram of a long-horizon agentic task. Tool call 1 → 2 → ... → hundreds chain across the screen. The upper row (Composer 2) goes off-rails partway through; the lower row (Composer 2.5) holds consistency all the way to the end. Drawn in a whiteboard style with arrows and sticky notes. — Figure: Figure 2: Drift comparison between Composer 2 and Composer 2.5 on a long-horizon agent loop. Per-chain consistency degradation is alleviated.

[Author's view] From an implementation standpoint, the three dominant failure modes for long-horizon agents are:

Context drift: losing the original intent mid-chain
Early tool-call termination: declaring success while the work is still incomplete
Multi-file / parallel subagent inconsistency: parts of the codebase are edited on stale premises

Composer 2.5 seems to put particular weight on fixing (2) and (3). This is where targeted RL with textual feedback finds its outlet. For products like Sales Claw — where one loop is "company lookup → form parsing → input → validation → send" spanning dozens of steps — this is exactly the improvement that matters.

Tool-call budget and event-loop endurance

[Unverified] Cursor doesn't publish hard numbers for "how many minutes / how many tool calls before instability sets in." The blog stays qualitative: "hundreds of tool calls," "sustained work," etc. In real deployments, you'll have to measure for yourself which ceiling hits first: the Cloud Agents timeout (estimated ~60 minutes, [Speculation]) or the Composer 2.5 context window (inherited from Kimi K2.5's published 200K–256K, [Speculation]).

5. Pricing strategy — Standard / Fast and what "1/10 cost" really refers to

[Official] Cursor's changelog publishes Composer 2.5's API price band as:

Standard: $0.50 / 1M input tokens, $2.50 / 1M output tokens (same as Composer 2)
Fast (default): $3.00 / 1M input tokens, $15.00 / 1M output tokens (same intelligence, higher throughput)

[Official] Alongside that, the blog asserts "CursorBench per-task cost is under $1 for Composer 2.5 vs up to $11 for competing frontier models."This is the source of the "1/10 cost" framing, but it's a median band on Cursor's own bench, not a universal fact — your tasks may land elsewhere.

Bar chart showing per-task cost in USD. X-axis: Composer 2.5 Standard / Composer 2.5 Fast / Opus 4.7 xhigh / GPT-5.5 high. Composer 2.5 Standard is under $1 — cheapest; Opus 4.7 xhigh is around $10-11; GPT-5.5 high is around $5. Annotated as based on Cursor official figures. — Figure: Figure 3: Approximate per-task cost (per Cursor official numbers). Distribution gap between Composer 2.5 and upper-tier frontier models.

What the Fast tier is for

[Author's view] The Fast tier ($3.00 / $15.00) is positioned to "deliver the same intelligence faster." The intent is to route latency-sensitive use cases (in-IDE inline suggestions, conversational chat) to Fast, while Cloud Agents (after-the-fact batch) stay on Standard. Anthropic and OpenAI offer dials like temperature and reasoning_effort on a single model; Cursor instead splits one model across two price tiers with identical intelligence.

Included usage and the initial 2x boost

[Official] Pro / Business / Enterprise included usage is doubled for the first week (2026-05-18 onward). [Unverified]: the absolute starting included usage isn't clearly published per plan on the pricing page, so you'll need to wait for the official update or read it from your management screen. In practice, use this first week to feel out Composer 2.5, then convert back to non-2x cost when estimating the steady-state bill.

6. Getting started — enabling Composer 2.5 in Cursor IDE and Cloud Agents

[Official] Per the Cursor changelog (Composer 2.5), no special setup is required — the latest Cursor app automatically switches its default model to Composer 2.5. To pin it explicitly:

Operational flow diagram. On the left, the developer's Cursor IDE (Fast tier, interactive / inline suggestions); on the right, Cloud Agents (Standard tier, background agents / long-horizon tasks); in the middle, the single Composer 2.5 model body. Drawn as a whiteboard diagram showing the same model running in two tiers. — Figure: Figure 4: How to split workloads between Cursor IDE and Cloud Agents on Composer 2.5

Update the Cursor app to the latest version— check Settings > About for 0.50.x or later
Open Settings > Models to bring up the model selector
Choose "composer-2.5"— or leave it on "Auto" and it will be used as the default
For Cloud Agents, create a new agent from the Cloud Agents tab on cursor.com and pick composer-2.5 as the model

Timeline chart plotting Composer 1 (2025-10, bundled with Cursor 2.0), Composer 2 (2025-10, Kimi K2.5 base), and Composer 2.5 (2026-05-18, 25x RL). Three indicators are shown as evolving along the Composer lineage: intelligence, cost efficiency, and long-horizon endurance. — Figure: Figure 5: Composer lineage timeline (Composer 1 → 2 → 2.5)

API access

[Unverified] As of 2026-05-18, there is no published path to call Composer 2.5 directly outside of Cursor IDE / Cloud Agents. There is no equivalent of pay-per-token access via Anthropic or OpenAI APIs. If you need to embed it into another product, your current option is to wrap the Cursor Agent / Background Agent via webhook or CLI. Whether or not an open API is planned is something to watch in the official docs and [Official] announcements.

7. Risks — in-house bench bias, no public API, evaluation pitfalls

Risk 1: In-house bench bias

[Author's view] CursorBench v3.1 is "a synthetic task pool derived from Cursor usage logs," and Composer 2.5 has been targeted-RL'ed against it. It is structurally predictable that an in-house model leads on its own in-house bench. Citing this as "we beat Opus 4.7" is the kind of overgeneralisation that backfires later. Use external benches (SWE-Bench Multilingual / Terminal-Bench 2.0) as the primary basis for selection decisions.

Risk 2: No external API blocks embedding

[Unverified] Composer 2.5 is exclusive to Cursor IDE and Cloud Agents — there is no standalone API like Anthropic's or OpenAI's. If you want to embed it into your own product (the Sales Claw scenario), your options are limited to:

Hit Background Agent inside Cursor IDE via webhook / CLI (within the official docs envelope)
Stay on Claude Opus 4.7 / GPT-5.5 API models in production and treat Composer 2.5 as an in-IDE dev-experience boost (Sales Claw's current recommendation)
Use the open Kimi K2.5 checkpoint and run your own similar RL pipeline (long-horizon, capital-intensive option)

Risk 3: Known evaluation gaps and Hacker News reactions

[Unverified] On Hacker News (item id 48182516), the thread carries a recurring criticism that "official Composer 2 benches diverge from real-world experience," and some of that has resurfaced for 2.5. This is opinion-grade information based on individual experience, but the broader correction — "frontier comparisons are ideal-condition results; production is its own thing" — is one you should always have on hand.

8. Sales Claw context — implications of domain-specialised targeted RL

[Author's view] Sales Claw's current architecture calls external API models like Claude Opus 4.7 and GPT-5.5, wraps them with policy controls (pre-send automated inspection, sales-policy NG detection, halting on CAPTCHA, send-rate limits, audit logging), and runs a self-driving loop on top. Cursor's Composer 2.5 strategy reads like a guide to what the next step could look like.

Why business-specific targeted RL matters

[Author's view] The dominant failure modes for B2B sales agents (Sales Claw included) are:

Hallucinating against the variation in real-world company website structures (declaring a form exists when it doesn't)
Misjudgement on CAPTCHA detection (counting a CAPTCHA-blocked attempt as a successful send)
Missing "sales NG words" in the final pre-send body check
Per-company premise drift during long batches

None of these get solved purely by "making the frontier LLM smarter." They sit squarely in the territory where "targeted RL with textual feedback against business-specific failure modes" actually helps. What Composer 2.5 demonstrates is the recipe you would apply on the business side. For Sales Claw, the realistic sequence is: pile on rule-based policy inspection and pre-send NG detection in the short term, accumulate operational logs into synthetic RL tasks, and keep the door open for a future specialised model.

Cost-shape implications

[Author's view] Cursor's "Standard / Fast two-tier" cost shape transfers cleanly to a sales agent like Sales Claw:

Standard (low cost): post-batch processing (company research, knowledge consolidation, send-history summaries)
Fast (high cost): interactive form input, real-time CAPTCHA detection, immediate policy decisions

Splitting one model across two price tiers maps neatly onto enterprise budgeting, and the Sales Claw pricing model could adopt the same shape.

cursor-composer-2-5

無料・MIT ライセンス。インストールせずにライブデモも試せます。

無料でダウンロードライブデモを試す GitHub

9. Wrap-up — "chasing the frontier" gives way to "business-specific RL"

[Official] Cursor Composer 2.5 shipped 2026-05-18. Built on Moonshot Kimi K2.5 with a 25x scaled synthetic RL task pool, it lines up with Opus 4.7 and GPT-5.5 on SWE-Bench Multilingual at roughly 1/10 the cost. It is rolled out as the default model in the Cursor app and Cloud Agents, and the first week comes with 2x included usage.

[Author's view] The real significance of this release is that "a route to practical agent-grade quality without giant in-house weights" has become live. That expands the medium-term option set for business-specialised agents like Sales Claw by one full lane.

[Official] Cursor also announced it is training its next-generation model on xAI Colossus 2 at 10x the compute, aiming at frontier-tier with the Composer 3 line. Read the evaluation in this article as a snapshot of 2026-05-18.

This English version is a translation of the Japanese-language original. Where wording differs, the Japanese version governs.

よくある質問

What is different between Composer 2.5 and Composer 2?

The base checkpoint (Moonshot Kimi K2.5) and the API base pricing (Standard $0.50 / $2.50) are unchanged from Composer 2. What is different: (1) synthetic RL task volume scaled 25x, (2) Sharded Muon optimizer with distributed orthogonalization for stable multi-GPU RL, (3) targeted RL with textual feedback aimed at specific failure modes (e.g. abandoning a tool-call loop early, losing consistency across multi-file edits). The result is 79.8% on SWE-Bench Multilingual (clear gain over Composer 2) and noticeable stability gains on the long-horizon agent tasks Cursor describes as "sustained work on long-running tasks." The Cursor app switches to Composer 2.5 as the default immediately, and Cloud Agents (background agent execution) defaults to the same model. We have not seen an explicit rollback-to-previous-model toggle in the UI as of this article.

In practice, what is it stronger at vs Opus 4.7 / GPT-5.5?

On SWE-Bench Multilingual it sits 0.7pt below Opus 4.7 (80.5%) and 2pt above GPT-5.5 (77.8%). On Terminal-Bench 2.0, Composer 2.5 ties Opus 4.7 (69.3% vs 69.4%) but falls well behind GPT-5.5 (82.7%). On Cursor's own CursorBench v3.1, Composer 2.5 hits 63.2%, beats Opus 4.7 xhigh (61.6%), and lands close to Opus 4.7 max (64.8%). Net: parity with Opus 4.7, mixed wins / losses vs GPT-5.5 depending on the task. The "under $1 per task vs up to $11 for competitors" claim is a CursorBench median band — you must re-measure against your own task mix. Practically, the strengths are stability across "in-IDE conversation / inline completion" and "long-running Cloud Agent" workloads on the same model, with materially better cost efficiency than pay-per-token API calls to Opus / GPT-5.5.

What do the Standard and Fast price tiers mean?

Standard is $0.50 / 1M input + $2.50 / 1M output (same as Composer 2); Fast is $3.00 / 1M input + $15.00 / 1M output, delivering "the same intelligence at higher throughput." The Cursor blog and changelog describe Fast as "high-throughput interactive use" — IDE inline completion, chat responses — while Standard fits backend batch on Cloud Agents. Anthropic / OpenAI expose dials like reasoning_effort and temperature on one model; Cursor instead provides "the same intelligence at two price points," which aligns well with enterprise budget management. The first week (from 2026-05-18) Pro / Business / Enterprise included usage is doubled, but absolute included-usage values per plan are not yet clearly on the public pricing page — wait for the official update.

Can I call Composer 2.5 directly via an external API?

Not as of 2026-05-18. There is no pay-per-token public API the way Anthropic and OpenAI offer. Whether one is planned is unconfirmed (no official statement). If you want to embed it in your own product, the current options are: (1) wrap Cursor's Background Agent / Cloud Agent inside the IDE via webhook or CLI (within official docs), (2) keep production on API models like Claude Opus 4.7 / GPT-5.5 and treat Composer 2.5 as an in-IDE dev-experience upgrade (Sales Claw's current pick), or (3) start from the open Kimi K2.5 checkpoint and run your own equivalent RL pipeline (a long-term capital-intensive bet). For a business agent like Sales Claw, the safe path is to bring Composer 2.5 in as a dev-experience tool while the production self-driving loop continues to run on API models.

Does it really hold up on long-horizon tasks?

The Cursor blog asserts improvement qualitatively — "sustained work on long-running tasks," "hundreds of tool calls" — but does not publish absolute numbers like "stable up to X minutes" or "up to N tool calls." On Cursor 0.50.x / Pro plan / Mac M2 Pro, the author ran "dependency audit → type-error fixes → test additions" (about 80 tool calls per loop) 3 times and all 3 completed (Composer 2 aborted partway about 1/3 of the time on the same task). Sample size 3 is far too small to claim significance. In production, measure for yourself which ceiling hits first: the Cloud Agents timeout (estimated ~60 minutes) or Composer 2.5's context window (inherited from Kimi K2.5's reported 200K–256K, both speculative). Before putting 200+-tool-call long-horizon tasks into production, measure empirical endurance against your task distribution.

How should I read the CursorBench scores?

CursorBench is an agent benchmark synthesised from Cursor's in-house usage logs, and Composer 2.5 has been targeted-RL'ed on it. Structurally, the in-house model is going to be favoured. Citing "CursorBench v3.1: 63.2% / Opus 4.7 max 64.8% / GPT-5.5 59.2%" as "beats Opus 4.7" is the kind of overgeneralisation that comes back to bite you. A safer reading hierarchy: (1) external benches SWE-Bench Multilingual / Terminal-Bench 2.0 first, (2) CursorBench as a conditional upper bound "as long as you're running on Cursor," (3) your own repo + your own task distribution as the final arbiter. Read it not as a general-purpose performance verdict but as "Opus-grade quality on Cursor IDE / Cloud Agents at lower cost and higher speed," and the bench claims line up much more closely with reality.

How does this affect a business agent like Sales Claw?

What Composer 2.5 demonstrates is that you can land on practical quality comparable to the frontier without giant in-house weights — by stacking large-volume synthetic RL tasks and targeted RL with textual feedback. That expands the medium-term option set for domain-specialised business agents like Sales Claw. Especially for failure modes specific to the domain — "variation in real-world company website structures," "CAPTCHA misjudgement," "missed sales-NG words in the pre-send body check," "premise drift across long batches" — making the frontier LLM smarter alone is not enough. Targeted RL with textual feedback is what helps. The realistic short-term plan is to stay on Claude Opus 4.7 / GPT-5.5 API models with thick policy inspection layered on top, while accumulating operational logs into synthetic RL tasks so you keep the door open for a future specialised model. The Standard / Fast two-tier cost shape is directly applicable to Sales Claw's own pricing design.

What changes during the first-week 2x boost?

Cursor's blog states "For the next week, we are doubling the included usage of the model," so included usage on Pro / Business / Enterprise is doubled. It's a standard nudge to drive adoption of the new default model. Watch out: the absolute starting included usage per plan is what you read off your "Included usage" management screen — it is not all written down on the public pricing page yet. Recommended operations: (1) use this first week to measure Composer 2.5's task-completion rate and per-task cost on real workloads, (2) re-project costs back to the non-2x steady state when deciding to keep the plan, and (3) push heavy Cloud Agents long-horizon work into this window. Do not misread "what 2x feels like" as "what production will cost."

参考文献

本記事は X 公式アカウントと公式ドキュメントを一次情報として参照しています。

[01]
Cursor — Introducing Composer 2.5 (official blog, 2026-05-18)2026-05-18
[02]
Cursor — Composer 2.5 (official changelog)2026-05-18
[03]
Cursor Forum — Composer 2.5 is now live (official announcement)2026-05-18
[04]
Cursor — Composer 2 (official blog, prior model reference)2025-10-29
[05]
Cursor 2.0 (official blog, Composer 1 / Cursor 2.0 release)2025-10-29
[06]
Cursor — Models Documentation (official docs)2026-05-18
[07]
Cursor — Pricing (official)2026-05-18
[08]
Moonshot — Kimi K2.5 Open Checkpoint (official release, base checkpoint)2025-09-01

この記事の著者