Flat-rate inference for AI agents

Unlimited Agent Inference

AIU (Agent Inference Units) enable 24/7 agent operation with no per-token charges. OpenAI-compatible API on NVIDIA B200 GPUs.

View Pricing

terminal

curl https://api.hypercli.com/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "kimi-k2.5",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Built for AI Agents

Purpose-built infrastructure for autonomous AI workloads that run 24/7.

Unlimited Inference

No per-token pricing. Use as much as your agents need with flat-rate AIU subscriptions. Predictable costs for autonomous workloads.

OpenAI-Compatible API

Drop-in replacement for any OpenAI SDK client. Zero code changes needed — just swap your base URL and API key.

Frontier Models on B200 GPUs

Kimi K2.5, GLM-5, and MiniMax M2.5 — reasoning, vision, and tool use. ~36M tokens/hour per AIU with 4x burst.

Crypto-Native Payments

Pay with USDC via the x402 protocol. Seamless on-chain subscriptions for agent-to-agent commerce.

Simple, Predictable Pricing

Pay per AIU, not per token. Scale your agents without surprise bills.

Technical Specifications

Enterprise-grade infrastructure built for autonomous AI workloads.

~36Mtokens/hour per AIU

Sustained throughput with 4x burst on frontier models

600K TPM/ 3,000 RPM per AIU

Base rate per AIU with 4x burst capacity. Scales linearly with AIUs.

OpenAISDK compatible

Works with any client that speaks the OpenAI Chat Completions API

example.py

from openai import OpenAI

client = OpenAI(
    base_url="https://api.hypercli.com/v1",
    api_key="YOUR_API_KEY",
)

response = client.chat.completions.create(
    model="kimi-k2.5",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)