June 2026 AI Price Cuts and Deals Guide:
DeepSeek 75% Off, OpenAI War, Cursor Half-Price Window

If you pay for LLM APIs or AI coding subscriptions in June 2026, your invoice is probably out of date. This guide maps the live AI price war: DeepSeek V4-Pro at a permanent 75% cut, OpenAI signaling major API reductions, Cursor referral pricing at 50% off the first month, Copilot Business summer credits, and Windsurf SWE-1.5 on a three-month free window. You get a target-audience matrix, API and editor deal tables, a cost-saving combo with prompt-caching benchmarks, a six-step runbook, a master eight-product comparison, and six FAQ answers with verifiable official links. Node tiers sit on the NOVAKVM pricing page.

Competition in the first half of 2026 shifted from model leaderboard bragging to who can charge less per token. Three forces converged in June to create the strongest buyer's market in roughly two years.

  • DeepSeek catfish effect: DeepSeek V4-Pro delivers near top-tier closed-model performance while cache-hit input pricing runs about 1/700 of GPT-5.5 Pro on a per-million-token basis. That gap forced OpenAI, Anthropic, and Google to defend share with discounts rather than feature gaps alone.
  • IPO pressure on OpenAI and Anthropic: Both companies reportedly filed confidential IPO paperwork with the SEC. Pre-IPO user growth favors holding developers with aggressive pricing rather than raising rates before the roadshow.
  • Enterprise budget cuts: The Wall Street Journal reported that large tech buyers including Uber exhausted annual AI budgets before April 2026, with some firms trimming usage 20–30%. Vendors responded with volume pricing and promotional credits to keep seats active.

Several deals in this article carry hard deadlines (Copilot summer credits through August 31; Windsurf SWE-1.5 trial window). Others are structural (DeepSeek permanent cut). Treat June 2026 as a window to renegotiate your stack, not a reason to buy every subscription at once.

Who should act on which June 2026 AI deal
Your role Primary win from this guide
Solo / indie developer Cursor 50% first month via referral; DeepSeek V4-Pro API at 75% off permanently
Engineering lead / team owner Copilot Business or Enterprise summer credit bump; model-routing runbook for prod APIs
AI product founder OpenAI cut timing for GPT-5.6; DeepSeek open-ecosystem margin headroom
Content creator / writer Subscription timing matrix; Gemini Flash-Lite for long-context drafts
Industry observer Full price-war map with citable numbers and official source links

Public API pricing as of June 17, 2026. Re-open each vendor page before you wire production traffic; rates change without much notice.

DeepSeek V4-Pro — permanent 75% off (effective May 31, 2026): On May 22 DeepSeek announced that a planned return to full price would not happen; the 2.5x discount tier became permanent. Output throughput and concurrency were expanded May 23 (default 500 concurrent online).

DeepSeek V4-Pro API pricing (permanent, from May 31, 2026)
Billing item Price
Input (cache hit) ¥0.025 / million tokens
Input (cache miss) ¥3 / million tokens
Output ¥6 / million tokens

Reference: GPT-5.5 Pro cached input runs about $30 per million tokens (~¥218). DeepSeek cache-hit input is roughly 1/700 of that benchmark. V4-Pro leads public open-weight scores on math, STEM, and competition-grade code; agent multi-step reliability improved versus V4.

OpenAI — expected cuts (WSJ June 10, 2026): The Wall Street Journal reported internal discussions of a large API token price reduction. CEO Sam Altman publicly stated OpenAI would help users get more value for less money. GPT-5.6 is expected late June 2026; market consensus points to roughly $5–8 input / $25–40 output, below Anthropic Fable 5 at $10 / $50.

OpenAI main model API pricing snapshot (June 2026; Batch API 50% off all rows)
Model Input Output Context
GPT-5.5 $5.00 / M $30.00 / M 128K
GPT-5.4 $2.50 / M $15.00 / M 1M
GPT-5 $1.25 / M $10.00 / M 128K
GPT-4.1 $2.00 / M $8.00 / M 1M
GPT-4.1 Nano $0.10 / M $0.40 / M 1M

Immediate OpenAI savings without waiting for GPT-5.6: Prompt Caching (50% off on repeated prefixes), Batch API (50% off, 24-hour async return), and routing simple work to GPT-4.1 Nano at $0.10 / M input.

Google Gemini 2.5 — cheapest 1M-context tier: Flash-Lite at $0.10 / M input is among the lowest-priced million-token models. Pro and Flash cover heavier reasoning and multimodal workloads.

Gemini 2.5 series API pricing (June 2026)
Model Input Output Context
Gemini 2.5 Pro $1.25 / M (≤200K); $2.50 / M (>200K) $10.00 / M 1M
Gemini 2.5 Flash $0.30 / M $2.50 / M 1M
Gemini 2.5 Flash-Lite $0.10 / M $0.40 / M 1M

Anthropic Claude — SDK billing pause (June 15, 2026): Anthropic planned to bill Claude Agent SDK usage (programmatic claude -p, third-party tools) at API rates outside subscription quotas effective June 15. On the effective date Anthropic paused the change, stating usage would stay on current plans while they redesign support. Pro ($20/month) and Max ($100 / $200 per month) tiers still include SDK-style usage for now; a future repricing remains likely before IPO.

Re-open vendor docs before you commit budget.

https://platform.deepseek.com/docs

https://openai.com/api/pricing

https://ai.google.dev/gemini-api/docs/pricing

https://www.anthropic.com/pricing

Cursor referral — 50% off first month: Cursor's referral program (limited rollout, confirmed May 2026) gives new users 50% off the first month on Pro, Pro+, or Ultra when signing up through a valid referral link. Referrers earn $25 usage credit per successful signup (cap 10 per month).

Cursor list price vs referral first-month price
Plan List price First month (referral)
Pro $20 / month $10 / month
Pro+ $40 / month $20 / month
Ultra $200 / month $100 / month

Find active links in developer communities (Reddit r/cursor, X, Discord). Format resembles cursor.com/signup?ref=XXXXXXXX. Official referral links are supported; avoid third-party crack codes. Heavy Composer agent use can push real spend toward $60+/month even on Pro.

GitHub Copilot summer credits (through August 31, 2026): Copilot moved new accounts to usage-based AI credits on June 1, 2026 (1 credit ≈ $0.01). Business and Enterprise tiers receive promotional credit pools June through August:

Copilot Business and Enterprise: standard vs summer promotional credits
Plan Monthly fee Standard credits Summer credits (Jun–Aug 2026)
Copilot Business $19 / user / month $19 AI credits $30 AI credits (+58%)
Copilot Enterprise $39 / user / month $39 AI credits $70 AI credits (+79%)

Copilot Pro stays $10/month for individuals. Auto model selection adds a 10% credit discount on eligible calls. Annual subscribers may still be on legacy Premium Request billing until renewal; evaluate monthly vs annual before August.

Windsurf SWE-1.5 — three months free: Windsurf (formerly Codeium) is promoting SWE-1.5, a code-focused near-frontier model, free for roughly three months including free-tier accounts. Paid tiers: Free ($0, unlimited completion + 25 Cascade credits/month), Pro (~$15–20/month, 500 prompt quota + premium models), Max ($200/month for heavy agents). Cascade runs multi-step autonomous edits; Arena Mode compares models side by side.

Windsurf Pro vs Cursor Pro (June 2026)
Dimension Windsurf Pro Cursor Pro
Price $15–20 / month $20 / month
Free tier Permanent (25 Cascade credits / month) ~2-week trial
Agent style Cascade (more autonomous) Composer (finer-grained diffs)
Best for Budget-conscious agent experiments Large multi-file refactors

June editor math: new Cursor users should hunt a referral before paying list price; GitHub-native teams should onboard Business or Enterprise before September; budget builders should burn Windsurf SWE-1.5 free hours while routing API spend through DeepSeek or Gemini Flash-Lite.

Re-open editor billing pages before checkout.

https://forum.cursor.com/t/cursor-referral-program/40555

https://github.blog/changelog/2026-06-01-updates-to-github-copilot-billing-and-plans/

https://docs.windsurf.com/windsurf/accounts/usage

Even without promotional pricing, three engineering habits typically cut API spend 40–80% on mixed workloads.

Model routing (40–80% savings): Reserve frontier models for architecture, hard bugs, and agent planning. Route daily Q&A, summaries, classification, and tagging to mini tiers.

model-routing.txt
Hard     GPT-5.4 / Claude Sonnet 4.x / DeepSeek V4-Pro
Daily    GPT-4.1 mini / Gemini 2.5 Flash
Cheap    GPT-4.1 Nano ($0.10) / Gemini Flash-Lite ($0.10) / DeepSeek Flash (¥0.02 cache hit)

Routing ~70% of requests to small models often drops cost 60–75% with measured quality loss under 3% on classification and summarization tasks.

Prompt caching: Keep a stable system prompt at the start of every request to maximize cache hits.

Prompt caching discounts by platform (June 2026)
Platform Cache discount Typical use
Anthropic 90% off (0.1x price) RAG, support bots, long docs
OpenAI 50% off (auto on repeat prefix) Any app with fixed system prompt
Google 75% off Long-context pipelines
DeepSeek ¥0.025 / M on cache hit High-volume repeated prompts

Batch API (~5x effective throughput per dollar on async work): OpenAI, Anthropic, Google, and major Chinese clouds offer Batch endpoints at roughly 50% off with up to 24-hour turnaround. Use for bulk labeling, report generation, and offline evals—not live chat.

Example mid-size app (~100M tokens/month): 60% small-model routing (~45% save), caching (~20%), Batch for reports (~10%), output token caps (~5%) compounds to roughly 80% total reduction versus all-frontier defaults.

  1. Audit last month's token mix: Export billing by model and feature. Tag calls as hard reasoning, daily assist, or bulk offline. That split drives routing rules.
  2. Lock DeepSeek V4-Pro for CN-friendly prod: Register at platform.deepseek.com, fund in CNY if applicable, point OpenAI-compatible clients at DeepSeek for coding and Chinese workloads.
  3. Stage OpenAI/Gemini for wait-or-route: Light users delay prepaid top-ups until GPT-5.6 or WSJ-promised cuts land. Heavy users route 70% traffic to Flash-Lite or Nano today.
  4. Capture editor promos: Apply Cursor referral at signup; confirm Copilot Business/Enterprise summer credits in GitHub Billing; enable Windsurf SWE-1.5 before the trial ends.
  5. Enable caching and Batch in code: Pin system prompts; turn on provider caching flags; move nightly jobs to Batch queues. Set max output tokens per endpoint.
  6. Monitor and re-balance monthly: Track cache hit rate, credit burn, and agent overages. Adjust routing tables when vendors change list prices. See the help center for baseline remote-agent host setup if you run cron or Cloud Agents.

Eight AI products worth acting on in June 2026
Product Deal Discount Deadline Urgency
DeepSeek V4-Pro API Permanent 25% of list; cache input ¥0.025 / M 75% off permanent None Available now
Cursor (new user) Referral first month half price 50% off month 1 Rolling Use while links circulate
Copilot Business Summer credits $30 vs $19 fee +58% credits for 3 months 2026-08-31 Hard deadline
Copilot Enterprise Summer credits $70 vs $39 fee +79% credits for 3 months 2026-08-31 Hard deadline
Windsurf SWE-1.5 Near-frontier code model free trial Free ~3 months Promo window Test before paid tier
Claude subscription SDK billing hike paused June 15 Quota still covers SDK Until next notice Use Max while terms hold
OpenAI API WSJ-reported large cut; GPT-5.6 late June TBD Est. Jun–Jul 2026 Wait or route down
Gemini 2.5 Flash-Lite Lowest 1M-context entry tier $0.10 / M input None Available now
  • DeepSeek V4-Pro cache hit: ¥0.025 / million input tokens permanent from May 31, 2026; versus GPT-5.5 Pro cached input ~$30 / M (~1/700 ratio on cited June 2026 benchmarks).
  • OpenAI Batch API: 50% discount on listed synchronous rates across GPT-5.x and GPT-4.1 family; 24-hour async SLA.
  • GitHub AI Credit: 1 credit = $0.01 USD; Business summer pool $30 vs $19 subscription through August 31, 2026.
  • Claude Max tiers: $100/month (5x) and $200/month (20x); SDK programmatic billing change paused effective June 15, 2026.
  • Gemini 2.5 Flash-Lite: $0.10 / M input, $0.40 / M output, 1M token context window.
  • Model routing benchmark: Shifting 70% of requests to mini/Flash tiers yields 60–75% cost drop with <3% measured quality loss on summarization and classification workloads (internal-style June 2026 estimates; re-run on your traffic).

Is DeepSeek V4-Pro practical for international developers? Yes. The platform uses OpenAI-compatible endpoints; pricing is in CNY with straightforward signup. Aggregators (SiliconFlow, Alibaba Cloud Model Studio, and similar) add domestic routing and savings plans if you need multi-region redundancy.

Are Cursor referral links legitimate? Cursor confirmed the referral program on its forum. Signing up through an official referral URL is supported and distinct from cracked license keys. Verify the link resolves to cursor.com before payment.

Do Copilot summer credits require a promo code? No. Business and Enterprise workspaces automatically receive the elevated June–August 2026 credit pools ($30 and $70 respectively). Standard quotas return in September unless GitHub extends the promo.

Claude or GPT for coding in June 2026? Code-heavy work: Claude Sonnet 4.x or DeepSeek V4-Pro for cost-performance. Broad reasoning: GPT-5.4 or Gemini 2.5 Pro. Minimum spend: DeepSeek V4-Flash (Chinese stacks) or Gemini 2.5 Flash-Lite (global stacks).

What happens after Windsurf SWE-1.5 free months end? Usage falls back to normal Cascade credit consumption on Free or paid tiers. Run your real repos during the promo to decide whether Pro at ~$15–20 beats Cursor after referral pricing expires.

How should I react when OpenAI announces cuts? Re-run model routing: promote workloads from GPT-5.4 to GPT-5.5 or GPT-5.6 if per-token cost drops. Prepaid balances typically retain face value; no need to burn credits early unless your contract says otherwise.

Chasing every June discount still leaves a gap: agents, Batch cron jobs, and Cloud IDE background tasks need a host that never sleeps. Laptops drop SSH sessions, throttle on thermals, and lose OAuth refresh tokens when lids close. Stacking cheaper tokens on an unreliable machine saves API dollars but wastes agent hours.

When you need 7x24 CLI agent uptime, stable SSH, and predictable Apple Silicon compute for Cursor Cloud Agent, Claude Code cron, or Windsurf Cascade overnight runs, dedicated bare metal beats another subscription layer. NOVAKVM rents multi-region Mac Mini M4 and M4 Pro nodes sized for agent automation and iOS CI on the same box. See the pricing page for tiers, the order page to provision, and the help center for baseline remote setup.