LLM APIs
API providers for large language models, including free tiers and local options.
Tool | Category | Segment | Provider | Plan | Monthly Price USD | Billing Model | Free Tier / Trial | Included Usage / Credits | Overages / Top-ups | API Compatibility | Model Access | Reference Model Price Input USD / 1M Tokens | Reference Model Price Output USD / 1M Tokens | Cheap Model Price Input USD / 1M Tokens | Cheap Model Price Output USD / 1M Tokens | Context / Rate Limits | Data Privacy / Training | Best Fit | Main Limits / Caveats |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
No tagline | LLM APIs | Frontier model API | OpenAI API | Pay-as-you-go | $0 subscription | Token-based API usage | No standing free tier | No monthly included usage published on pricing page | Prepaid/API billing by model and feature | Native OpenAI API; broad SDK support | GPT-5.5, GPT-5.1, GPT-5.4 mini, GPT-4.1 family, realtime/audio/image tools | $5.00 in / $30.00 out for GPT-5.5 | $30.00 | $0.75 in for GPT-5.4 mini | $4.50 out for GPT-5.4 mini | Rate limits depend on account tier and model | API data is governed by OpenAI API data controls; verify org retention settings | Default choice for broad SDK support and frontier models | No free API quota; costs can rise quickly with long context/tool calls |
No tagline | LLM APIs | Frontier model API | OpenAI API | Batch API | $0 subscription | 50 percent lower token price for async batch jobs | ✕ | No monthly included usage | Same API billing, discounted for batch-compatible workloads | Native OpenAI Batch API | Same supported batchable OpenAI models/features | $2.50 in / $15.00 out equivalent for GPT-5.5 batch | $15.00 | $0.375 in equivalent for GPT-5.4 mini batch | $2.25 out equivalent for GPT-5.4 mini batch | Async batch window; not for interactive latency | Same API data controls as standard OpenAI API | Offline evals, document processing, synthetic data, backfills | Not suitable for realtime app UX |
No tagline | LLM APIs | Frontier model API | Anthropic Claude API | Pay-as-you-go | $0 subscription | Token-based API usage | Console credits/trials vary by account | No fixed monthly included usage published on pricing page | Pay by model; prompt caching and batch discounts available | Anthropic Messages API; SDKs; many gateways support Claude | Claude Opus 4.8, Claude Sonnet 4.5, Claude Haiku 4.5 | $5.00 in / $25.00 out for Claude Opus 4.8 | $25.00 | $1.00 in for Claude Haiku 4.5 | $5.00 out for Claude Haiku 4.5 | Rate limits depend on API tier; prompt caching available | Anthropic API data policy; verify retention and zero-retention eligibility | Claude-native apps, reasoning, coding, long-context workflows | Model access and limits vary by account and region |
No tagline | LLM APIs | Frontier model API | Anthropic Claude API | Batch API | $0 subscription | 50 percent discount for batch processing | ✕ | No fixed monthly included usage | Batch jobs billed at discounted token prices | Anthropic Message Batches API | Batchable Claude models | $2.50 in / $12.50 out for Claude Opus 4.8 batch | $12.50 | $0.50 in for Claude Haiku 4.5 batch | $2.50 out for Claude Haiku 4.5 batch | Async batch processing; not low latency | Same Anthropic API policy; verify retention settings | Large non-interactive processing and eval workloads | Batch output is delayed; not for chat UX |
No tagline | LLM APIs | Frontier model API | Google Gemini API | Free | $0 | Free quota by model | ✓ | Free tier available for selected Gemini API models; limits vary by model and region | Upgrade to paid tier through Google AI Studio / Google Cloud billing | Google Gemini API; OpenAI-compatible endpoint available for some workflows | Gemini 3 Pro Preview, Gemini 2.5 Pro, Gemini 2.5 Flash, Gemini 2.5 Flash-Lite | Free where listed for eligible models | Free where listed for eligible models | $0 on free tier | $0 on free tier | Free tier rate limits are lower and vary by model | Free tier prompts/responses may be used to improve Google products; paid tier not used for training per pricing page | No-cost prototyping with strong models | Free tier quotas can change and are not for sensitive data unless policy is acceptable |
No tagline | LLM APIs | Frontier model API | Google Gemini API | Paid tier | $0 subscription | Token-based API usage | ✕ | No monthly included usage; pay per token | Billed by model, context length and modality | Google Gemini API / Google Cloud billing | Gemini 3 Pro Preview, Gemini 2.5 Pro, Flash, Flash-Lite | $2.00 in / $12.00 out for Gemini 3 Pro Preview | $12.00 | $0.10 in for Gemini 2.5 Flash | $0.40 out for Gemini 2.5 Flash | Paid tier has higher rate limits than free tier; exact quotas by model | Paid tier inputs/outputs are not used to improve Google products per pricing page | Production apps that need Gemini pricing and Google ecosystem | Long-context, grounding and modality prices differ by model |
No tagline | LLM APIs | European model API | Mistral La Plateforme | Pay-as-you-go | $0 subscription | Token-based API usage | Free tier may require opt-in/data-training settings; verify account | No fixed monthly included usage on pricing table | Pay by model and feature | Mistral API; OpenAI-compatible integrations available through clients/gateways | Mistral Large, Medium, Small, Codestral, Magistral, embedding/OCR/audio models | $2.00 in / $6.00 out for Ministral 3 14B on pricing table | $6.00 | $0.10 in for Mistral Small 3.2 | $0.30 out for Mistral Small 3.2 | Rate limits depend on workspace/tier | EU-focused provider; verify free-tier training opt-in and enterprise privacy needs | Developers wanting European provider and strong open/proprietary models | Model list and prices move often; some products are non-text modalities |
No tagline | LLM APIs | Fast inference API | GroqCloud | Free | $0 | Free developer quota | ✓ | Free limits by model, e.g. requests/day and tokens/minute in Groq rate-limit docs | Upgrade to Dev Tier / paid usage for higher limits | OpenAI-compatible API surface for many chat workflows | Llama, Qwen, DeepSeek, GPT-OSS, Whisper and other fast-hosted models | $0 on free quota | $0 on free quota | $0 on free quota | $0 on free quota | Free limits are model-specific; examples include RPM/TPM/RPD/TPD limits | Verify Groq data processing and retention terms for production | Ultra-low-latency open model inference and prototypes | Free quota is generous but not guaranteed for production |
No tagline | LLM APIs | Fast inference API | GroqCloud | Paid usage | $0 subscription | Token-based API usage | ✕ | No fixed monthly included usage | Pay by model; higher limits through paid tiers | OpenAI-compatible API surface for many chat workflows | Llama, Qwen, DeepSeek, GPT-OSS, Whisper and other hosted models | Model-specific pricing | Model-specific pricing | Model-specific pricing | Model-specific pricing | Paid limits depend on usage tier | Verify Groq data processing and retention terms for production | Production low-latency open-model apps | Official pricing is model-specific and may require console context |
No tagline | LLM APIs | Model gateway | OpenRouter | Free models | $0 | Free model quota | ✓ | Free models and shared quota; local resource noted 20 RPM and many free models | Buy credits for paid models or BYOK routing | OpenAI-compatible gateway API | Hundreds of hosted models from OpenAI, Anthropic, Google, Meta, Mistral, DeepSeek and others | $0 for models marked :free | $0 for models marked :free | $0 for models marked :free | $0 for models marked :free | Free model limits and availability vary by provider | Provider routing and data policy depend on selected model/provider | Trying many models without separate provider accounts | Free models can disappear or throttle; production should pin fallback models |
No tagline | LLM APIs | Model gateway | OpenRouter | Pay-as-you-go | $0 subscription | Prepaid credits / token-based routing | ✕ | No monthly included usage | Buy credits; model prices pass through with OpenRouter routing | OpenAI-compatible gateway API | Commercial and open models from many providers | Model-specific pass-through pricing | Model-specific pass-through pricing | Model-specific pass-through pricing | Model-specific pass-through pricing | Limits depend on model/provider and account balance | Data handling depends on routed provider; check per-model provider policy | One API key for multi-model routing and fallback | Adds gateway dependency and per-model policy complexity |
No tagline | LLM APIs | Model gateway | Hugging Face Inference Providers | Free user | $0 | Monthly included credits | ✓ | $0.10 monthly inference credits for free users | Pay-as-you-go after credits where available | Hugging Face routed provider APIs; provider SDKs and HF libraries | Many open models across supported inference providers | Provider/model-specific | Provider/model-specific | Provider/model-specific | Provider/model-specific | Credits and provider availability vary by model/provider | Data policy depends on provider routed through Hugging Face | Light experimentation with open models | Tiny free credit amount; not enough for serious production |
No tagline | LLM APIs | Model gateway | Hugging Face Inference Providers | PRO | $9/user | Subscription with monthly credits | ✕ | $2 monthly inference credits included for PRO users | Pay-as-you-go beyond included credits | Hugging Face routed provider APIs; provider SDKs and HF libraries | Many open models across supported inference providers | Provider/model-specific | Provider/model-specific | Provider/model-specific | Provider/model-specific | Provider quotas and availability vary | Data policy depends on selected provider | Developers already using Hugging Face Hub | Credits are modest; high volume still pay-as-you-go |
No tagline | LLM APIs | Model gateway | Hugging Face Inference Providers | Team | $20/user | Team subscription with org credits | ✕ | $2 monthly inference credits per seat included | Pay-as-you-go beyond included credits | Hugging Face routed provider APIs; provider SDKs and HF libraries | Many open models across supported inference providers | Provider/model-specific | Provider/model-specific | Provider/model-specific | Provider/model-specific | Org billing and quotas depend on provider | Data policy depends on selected provider | Small teams using HF org workflows | Enterprise custom options excluded |
No tagline | LLM APIs | Low-cost model API | DeepSeek API | Pay-as-you-go | $0 subscription | Token-based API usage | ✕ | No fixed monthly included usage on pricing page | Pay by model; off-peak discounts may apply | OpenAI-compatible API | deepseek-chat, deepseek-reasoner and related models | $0.56 in / $1.68 out for deepseek-chat standard cache-miss pricing | $1.68 | $0.028 in cache-hit for deepseek-chat | $1.68 out | Rate limits depend on account and model | Verify DeepSeek data/security policy before proprietary workloads | Very low-cost reasoning/chat API | Availability and regional policy may matter for commercial use |
No tagline | LLM APIs | Edge/serverless model API | Cloudflare Workers AI | Free | $0 | Free daily allocation | ✓ | 10,000 neurons/day free allocation | Upgrade to Workers Paid for higher allocation and pay-as-you-go neurons | Workers AI REST/API bindings; runs inside Cloudflare Workers | Cloudflare-hosted open models including Llama, Qwen, Mistral, Gemma, Whisper and embeddings | Neuron-based, model-specific | Neuron-based, model-specific | Neuron-based, model-specific | Neuron-based, model-specific | Free allocation resets daily | Cloudflare account data/security terms apply | Edge apps and prototypes already on Cloudflare | Pricing unit is neurons, not simple token price |
No tagline | LLM APIs | Edge/serverless model API | Cloudflare Workers AI | Paid | $5 account minimum for Workers Paid | Subscription plus usage | ✕ | Higher Workers platform limits; Workers AI charged by neurons | Pay-as-you-go beyond free allocation | Workers AI REST/API bindings; Cloudflare Workers integration | Cloudflare-hosted open models | Neuron-based, model-specific | Neuron-based, model-specific | Neuron-based, model-specific | Neuron-based, model-specific | Account/platform limits depend on Workers plan | Cloudflare account data/security terms apply | Production edge AI workloads | Neuron pricing is harder to compare against token APIs |
No tagline | LLM APIs | Open model inference API | Fireworks AI | Trial credits | $0 | Signup credit / no payment method path | ✓ | Local resource and pricing page indicate free/trial credits for new accounts | Move to pay-as-you-go or monthly Fire Pass | OpenAI-compatible API for many serverless models | Open-source and partner models, image/audio and fine-tune options | $0 until credits are exhausted | $0 until credits are exhausted | $0 until credits are exhausted | $0 until credits are exhausted | Limits depend on account and selected model | Verify model/provider data policy and Fireworks retention terms | Testing hosted open models quickly | Trial credit amount can change; verify account console |
No tagline | LLM APIs | Open model inference API | Fireworks AI | Pay-as-you-go | $0 subscription | Token/usage-based serverless inference | ✕ | No fixed monthly included usage | Pay by model; serverless and dedicated deployments available | OpenAI-compatible API for many serverless models | Llama, DeepSeek, Qwen, Mixtral and many open models | Model-specific pricing | Model-specific pricing | Model-specific pricing | Model-specific pricing | Limits depend on account and deployment type | Verify model/provider data policy and Fireworks retention terms | Open model production inference with good latency | Dedicated deployments and enterprise options excluded |
No tagline | LLM APIs | Open model inference API | Fireworks AI | Fire Pass | $49/user | Monthly subscription / access pass | ✕ | Fire Pass gives access to Fireworks app/API benefits listed on pricing page | Usage and premium models may still have limits depending on account | Fireworks API and app surfaces | Fireworks-hosted models | Plan-specific | Plan-specific | Plan-specific | Plan-specific | Plan details vary by account/product | Verify data policy for selected model and deployment | Users who want a monthly Fireworks bundle | Not as transparent as pure token PAYG |
No tagline | LLM APIs | Open model inference API | Together AI | Pay-as-you-go | $0 subscription | Token-based serverless inference | Trial/promotional credits may vary by account | No fixed public monthly allowance in pricing docs | Pay by model; dedicated endpoints available | OpenAI-compatible API and Together SDK | Meta Llama, Qwen, DeepSeek, Mistral, FLUX and other open models | Model-specific pricing | Model-specific pricing | Model-specific pricing | Model-specific pricing | Rate limits and quotas depend on account/model | Verify Together data retention/training terms for production | Hosted open model inference and fine-tuning ecosystem | Trial credits are account-dependent; dedicated endpoints excluded |
No tagline | LLM APIs | Model gateway | Vercel AI Gateway | Free | $0 | Monthly included credits | ✓ | $5/month in included AI Gateway credits on Free per docs | Buy credits / upgrade Vercel plan for more | AI Gateway routes to model providers; Vercel AI SDK friendly | OpenAI, Anthropic, Google, xAI, Groq, Mistral and other supported providers | Provider/model-specific | Provider/model-specific | Provider/model-specific | Provider/model-specific | Usage limited by included credits and provider routing | Data policy depends on Vercel gateway and selected provider | Next.js/Vercel projects needing one AI gateway | Best when already on Vercel; provider policies still matter |
No tagline | LLM APIs | Model gateway | Vercel AI Gateway | Pro plan credits | $20/user Vercel Pro base | Vercel plan plus AI Gateway usage credits | ✕ | $15/month in included AI Gateway credits on Pro per docs | Buy additional credits; provider/model-specific charges | AI Gateway routes to model providers; Vercel AI SDK friendly | OpenAI, Anthropic, Google, xAI, Groq, Mistral and other supported providers | Provider/model-specific | Provider/model-specific | Provider/model-specific | Provider/model-specific | Usage limited by credits, plan and model/provider routing | Data policy depends on Vercel gateway and selected provider | Production apps deployed on Vercel | Enterprise custom tier excluded |
No tagline | LLM APIs | Model API | AI21 Studio | Free Trial | $0 | Trial credits | ✓ | $10 trial credits for 3 months listed on pricing page | Move to pay-as-you-go after credits expire/exhaust | AI21 API | Jamba, Jurassic/AI21 models and task-specific endpoints | $2.00 in / $8.00 out for Jamba Large 1.7 | $8.00 | $0.20 in for Jamba Mini 1.7 | $0.40 out for Jamba Mini 1.7 | Rate limits depend on account and model | Verify AI21 data handling terms | Trying Jamba models and AI21 task APIs | Trial expires after stated period |
No tagline | LLM APIs | Model API | AI21 Studio | Pay-as-you-go | $0 subscription | Token-based API usage | ✕ | No fixed monthly included usage | Pay by model after trial | AI21 API | Jamba, Jurassic/AI21 models and task-specific endpoints | $2.00 in / $8.00 out for Jamba Large 1.7 | $8.00 | $0.20 in for Jamba Mini 1.7 | $0.40 out for Jamba Mini 1.7 | Rate limits depend on account and model | Verify AI21 data handling terms | Apps needing AI21/Jamba models | Smaller ecosystem than OpenAI/Anthropic/Gemini |
No tagline | LLM APIs | Search-grounded LLM API | Perplexity API | Pay-as-you-go | $0 subscription | Token + search/request pricing | ✕ | No fixed monthly included usage | Pay by model plus search/context features | Perplexity API | Sonar, Sonar Pro, Sonar Reasoning, Sonar Deep Research | $1.00 in / $1.00 out for Sonar Pro text token pricing | $1.00 plus search/request costs | $1.00 in for Sonar | $1.00 out for Sonar | Limits depend on account tier and model | Search data and provider terms apply; verify citations/privacy needs | Grounded answers, research assistants, search-heavy apps | Pricing includes request/search components, not only tokens |
No tagline | LLM APIs | Hosted model marketplace | Replicate | Pay-as-you-go | $0 subscription | Usage-based compute/model pricing | Limited free usage may vary by account/model | No fixed monthly included usage | Pay by model runtime/prediction; some models have per-second or per-run pricing | Replicate API and client libraries | Open-source text, image, video, audio and multimodal models | Model/runtime-specific | Model/runtime-specific | Model/runtime-specific | Model/runtime-specific | Limits vary by account and model hardware | Replicate/model owner policies apply | Trying many open models across modalities | Text LLM costs are harder to normalize than pure token APIs |
No tagline | LLM APIs | Enterprise-friendly model API | Cohere | Trial | $0 | Trial key / trial limits | ✓ | Trial API key is limited and non-commercial per local resource; official docs route pricing by model | Upgrade to production/API billing | Cohere API | Command, Embed, Rerank, Aya and related models | $0 until trial quota exhausted | $0 until trial quota exhausted | $0 until trial quota exhausted | $0 until trial quota exhausted | Trial rate limits and monthly request limits apply | Verify Cohere trial/commercial data terms | Testing Command and Rerank APIs | Trial may be non-commercial and quota-limited |
No tagline | LLM APIs | Enterprise-friendly model API | Cohere | Pay-as-you-go | $0 subscription | Token/request-based API usage | ✕ | No fixed monthly included usage | Pay by model/task | Cohere API | Command, Embed, Rerank, Aya and related models | Model/task-specific | Model/task-specific | Model/task-specific | Model/task-specific | Production limits depend on account/model | Cohere enterprise/privacy posture; verify exact retention setting | RAG apps needing rerank/embedding plus chat | Pricing differs by task; not just chat tokens |
No tagline | LLM APIs | Accelerated inference API | NVIDIA NIM API Catalog | Free credits | $0 | Signup credits / hosted API catalog | ✓ | Signup credits for NVIDIA-hosted NIM API catalog; local resource notes 1K credits signup | Buy/upgrade through NVIDIA ecosystem or self-host NIM | NVIDIA-hosted API endpoints and NIM containers | Llama, Mistral, Qwen, Nemotron and other NIM-hosted models | $0 until credits exhausted | $0 until credits exhausted | $0 until credits exhausted | $0 until credits exhausted | Credit, RPM and verification requirements apply | NVIDIA terms and selected model policy apply | Trying optimized NIM-hosted open models | Credit system is less transparent than token-price APIs |
No tagline | LLM APIs | Accelerated inference API | Cerebras Inference | Free | $0 | Free developer quota | ✓ | Free usage tier and model rate limits shown in Cerebras pricing/rate-limit docs | Upgrade to Developer / paid usage for higher limits | OpenAI-compatible API | Llama, Qwen, GPT-OSS and Cerebras-hosted fast inference models | $0 on free quota | $0 on free quota | $0 on free quota | $0 on free quota | Free rate limits are model-specific | Verify Cerebras data terms for production | Fast open-model inference experiments | Free quota is not production capacity |
No tagline | LLM APIs | Accelerated inference API | Cerebras Inference | Developer | $0 subscription | Token-based paid API usage | ✕ | No fixed monthly included usage | Pay by model/token once paid usage is enabled | OpenAI-compatible API | Llama, Qwen, GPT-OSS and Cerebras-hosted fast inference models | Model-specific pricing | Model-specific pricing | Model-specific pricing | Model-specific pricing | Paid limits higher than free where available | Verify Cerebras data terms for production | Low-latency open-model inference at scale | Exact model prices and limits require current pricing table/console |
No tagline | LLM APIs | Local/self-hosted API | Ollama | Local | $0 + hardware | Free local software | ✓ | Unlimited local usage subject to local hardware | No vendor overage; pay hardware/electricity/cloud GPU if used | Ollama API; OpenAI-compatible endpoint support documented | Local open models such as Llama, Qwen, Mistral, Gemma and custom Modelfiles | $0 software cost | $0 software cost | $0 software cost | $0 software cost | Limited by local CPU/GPU/RAM and model size | Local-first; provider training does not apply unless using remote models | Private prototyping and offline/local workflows | Requires hardware and model management; quality depends on local model |
No tagline | LLM APIs | Local/self-hosted API | LM Studio | Local | $0 + hardware | Free local app/server | ✓ | Unlimited local usage subject to local hardware | No vendor overage; pay hardware/electricity/cloud GPU if used | OpenAI-like local server API | Local GGUF/open models downloadable through LM Studio | $0 software cost | $0 software cost | $0 software cost | $0 software cost | Limited by local CPU/GPU/RAM and model size | Local-first; no provider training for local inference | Non-technical local API and model testing | Desktop app dependency; production self-hosting needs care |
No tagline | LLM APIs | Local/self-hosted API | LocalAI | Self-hosted | $0 + hardware | Free open-source software | ✓ | Unlimited local/self-hosted usage subject to infrastructure | No vendor overage; pay infrastructure only | OpenAI-compatible local API | Runs local LLMs, image/audio models and embeddings depending on setup | $0 software cost | $0 software cost | $0 software cost | $0 software cost | Limited by server hardware and model backend | Self-hosted; data stays on your infrastructure if configured correctly | Teams needing OpenAI-compatible local/private endpoints | Ops burden and performance tuning are on you |