Audio

Text-to-speech, speech-to-text, music, and voice cloning.

Tool	Category	Segment	Platform	Plan	Monthly Price USD	Pricing Model	Free Tier / Trial	Included Credits / Usage	Audio Modes	STT / TTS / Voice Support	API Access	Commercial Rights / Privacy	Models / Quality	Latency / Realtime	Team / Collaboration	Best Fit	Main Limits / Caveats
ElevenLabs Free No tagline	Audio	AI voice suite	ElevenLabs	Free	$0	Monthly credit plan	✓	10k credits/month; about 10 TTS minutes listed	TTS, STT, sound effects, voice design, music, productions	TTS and STT included; voice cloning/design limited	✓	No commercial license on free plan; account terms apply	High-quality hosted voice models	Streaming and low-latency options exist, but free is limited	Individual account	Testing ElevenLabs voice quality before paying	Credits reset monthly; commercial use requires paid tier
ElevenLabs Starter 1 month subscription No tagline	Audio	AI voice suite	ElevenLabs	Starter	$6/mo	Monthly subscription with credits	No; paid plan	30k credits/month; about 30 TTS minutes listed	TTS, STT, voice cloning, dubbing, music	TTS/STT plus instant voice cloning	✓	Commercial license included	High-quality hosted voice models	API and streaming support; credit limits still apply	Individual creator account	Lowest paid ElevenLabs tier with commercial license	Small credit pool; professional voice cloning starts higher
ElevenLabs Creator 1 month subscription No tagline	Audio	AI voice suite	ElevenLabs	Creator	$22/mo list; $11 first-month promo displayed	Monthly subscription with credits	No; paid plan	121k credits/month; about 121 TTS minutes listed	TTS, STT, professional voice cloning, dubbing	TTS/STT plus professional voice cloning	✓	Commercial license included; paid credits can roll over up to two months while active	Higher creator-grade voice quality and cloning	Good for creator workflows; 128kbps output listed	Individual creator account	Main creator tier for cloning and recurring content	Displayed promo can differ from ongoing list price; credits vary by model
ElevenLabs Pro 1 month subscription No tagline	Audio	AI voice suite	ElevenLabs	Pro	$99/mo	Monthly subscription with credits	No; paid plan	600k credits/month; about 600 TTS minutes listed	TTS, STT, cloning, studio/API audio	Higher-quality API output and creator features	✓	Commercial license included	192kbps and 44.1kHz PCM API output listed	Higher production throughput than Creator	Individual/pro creator account	Professional narration, apps and voice products needing more quota	Still credit-based; no team seats until Scale
ElevenLabs Scale 1 month subscription No tagline	Audio	AI voice suite	ElevenLabs	Scale	$299/mo	Monthly subscription with team seats	No; paid plan	1.8M credits/month; 3 seats; 3 professional voice clones	TTS, STT, voice cloning, productions, team workspace	Fuller production voice suite	✓	Commercial license included	High-volume hosted voice models	Team/project workflows and higher included credits	3 workspace seats	Small teams producing recurring AI audio	Business/enterprise features and larger seats cost more
Cartesia Free No tagline	Audio	Realtime TTS/STT API	Cartesia	Free	$0	Monthly included credits	✓	Sonic-3.5 TTS about 27 minutes/month; 2 concurrent requests listed	TTS, STT and voice agent building blocks	Sonic TTS; Ink STT on paid plan table	✓	Commercial/data terms depend on Cartesia account terms	Sonic-3.5 low-latency TTS	Designed for realtime voice; low-latency positioning	Individual developer account	Trying low-latency TTS before committing	Free usage is small and concurrency-limited
Cartesia Starter 1 month subscription No tagline	Audio	Realtime TTS/STT API	Cartesia	Starter	$5/mo	Monthly subscription with credits	No; paid plan	Sonic-3.5 about 133 minutes/month; 3 concurrent requests listed	TTS plus basic voice cloning workflows	TTS and selected voice features	✓	Commercial/data terms depend on Cartesia account terms	Sonic-3.5; instant voice cloning listed	Realtime-oriented TTS	Individual developer account	Low-cost TTS API for prototypes and small products	STT hours and advanced agent features sit in higher plan areas
Cartesia Pro 1 month subscription No tagline	Audio	Realtime TTS/STT API	Cartesia	Pro	$49/mo	Monthly subscription with larger credits	No; paid plan	Sonic-3.5 about 1,667 minutes/month; 5 concurrent requests listed	TTS, voice cloning and voice API workflows	TTS with voice cloning and broader API usage	✓	Commercial/data terms depend on Cartesia account terms	Sonic-3.5 production TTS	Higher quota/concurrency for realtime voice apps	Individual/pro developer account	Production TTS with predictable monthly included minutes	Exact STT/agent usage may consume credits differently
Cartesia Scale 1 month subscription No tagline	Audio	Realtime TTS/STT API	Cartesia	Scale	$299/mo	Monthly subscription with high-volume credits	No; paid plan	Sonic-3.5 about 10,667 minutes/month; 15 concurrent requests listed	High-volume TTS/STT/voice agent stack	TTS plus broader speech platform features	✓	Commercial/data terms depend on Cartesia account terms	Sonic-3.5 and Ink-2 platform models	Higher concurrency and realtime throughput	Team/production workflow	High-volume voice products needing predictable TTS minutes	Custom enterprise still required for larger terms
Deepgram Pay As You Go No tagline	Audio	Speech API	Deepgram	PAYG	$0 base	Usage-based API with free credit	✓	$200 free credit; then pay-as-you-go	STT, TTS, voice agent API, audio intelligence	Nova/Flux STT, Aura TTS, Voice Agent API	✓	Deepgram account terms apply; enterprise terms available separately	Nova-3 STT, Flux STT, Aura TTS	Realtime streaming and voice-agent latency support	Developer/startup account; up to public-model concurrency listed	Production speech API with strong free credit	Free credit is credit-based, not a permanent monthly allowance
Deepgram Growth annual commitment No tagline	Audio	Speech API	Deepgram	Growth	$4K+/year	Annual prepaid credits with discount	No; paid annual commitment	Prepaid credits redeemed against usage; up to 20% savings listed	STT, TTS, voice agent API, audio intelligence	Same public model endpoints with higher WSS concurrency	✓	Deepgram account terms apply	Nova-3/Flux/Aura public models	Higher WSS concurrency than PAYG on pricing page	Growing application plan	Teams with predictable speech volume	Annual commitment; not a casual monthly plan
AssemblyAI Free No tagline	Audio	Speech-to-text API	AssemblyAI	Free	$0	Free API tier	✓	Up to 185 hours pre-recorded audio and 333 hours streaming audio free listed	Pre-recorded STT, realtime STT, speech understanding	STT-focused; voice agent API also listed	✓	AssemblyAI account terms apply	Universal STT models	Realtime endpoint available on free tier	Developer account	Testing transcription and speech understanding at meaningful volume	Free tier scope and rate limits can differ from production needs
AssemblyAI Pay As You Go No tagline	Audio	Speech-to-text API	AssemblyAI	PAYG	$0 base	Usage-based API	After free tier	Universal-3 Pro $0.21/hr; Universal-2 $0.15/hr; Whisper-Streaming $0.30/hr listed	Pre-recorded STT, realtime STT, speech understanding, voice agents	STT and speech intelligence APIs	✓	AssemblyAI account terms apply	Universal-3 Pro and Universal-2 transcription models	Realtime STT available; pricing by audio hour	Developer/product account	Accurate transcription and speech intelligence without subscription	TTS is not the main product; add-ons may cost extra
OpenAI GPT-Realtime-2 API No tagline	Audio	Realtime voice API	OpenAI	GPT-Realtime-2	$0 base	Usage-based audio/text/image tokens	No included free tier on pricing page	Audio $32/1M input tokens, $0.40/1M cached input, $64/1M output tokens	Speech-to-speech realtime voice agents	Native realtime voice model with text/image input support	✓	API data and privacy governed by OpenAI API terms	Most capable OpenAI realtime voice model listed	Realtime conversational voice with interruption handling	Developer/API account	Native voice agents inside OpenAI stack	Token-based audio billing is harder to estimate than per-minute STT/TTS
OpenAI GPT-Realtime-Whisper / Translate API No tagline	Audio	Realtime STT / translation API	OpenAI	Realtime Whisper / Translate	$0 base	Per-minute realtime audio pricing	No included free tier on pricing page	GPT-Realtime-Whisper $0.017/min; GPT-Realtime-Translate $0.034/min	Live transcription and live speech translation	Realtime STT and translation	✓	API data and privacy governed by OpenAI API terms	OpenAI realtime transcription and translation models	Designed for live speech streams	Developer/API account	Live captions, translation and voice products using OpenAI	Separate from full speech-to-speech GPT-Realtime-2 billing
OpenAI GPT-4o mini TTS API No tagline	Audio	Text-to-speech API	OpenAI	GPT-4o mini TTS	$0 base	Usage-based TTS tokens	No included free tier on model page	Text input $0.60/1M tokens; audio output $12/1M tokens	TTS generation	TTS only for this model row	✓	API data and privacy governed by OpenAI API terms	Steerable OpenAI TTS model	API latency depends on request size and endpoint	Developer/API account	Programmatic TTS when already using OpenAI APIs	2,000 input token maximum listed on model page
Groq Whisper V3 Large API No tagline	Audio	Fast STT API	Groq	Whisper V3 Large	$0 base	Usage-based API; free console access via limits	Free API key / free plan limits listed separately	$0.111/hour on pricing page; audio minimum 10 seconds/request	Speech-to-text transcription and translation	Whisper-family STT on GroqCloud	✓	GroqCloud terms apply; model license/terms apply	Whisper Large V3 hosted on Groq	High speed factor listed; very fast transcription	Developer/API account	Fast, cheap Whisper transcription	Free usage is capped by rate limits; not a monthly credit plan
Groq Whisper V3 Turbo API No tagline	Audio	Fast STT API	Groq	Whisper V3 Turbo	$0 base	Usage-based API; free console access via limits	Free API key / free plan limits listed separately	$0.04/hour on pricing page; audio minimum 10 seconds/request	Speech-to-text transcription and translation	Whisper-family STT on GroqCloud	✓	GroqCloud terms apply; model license/terms apply	Whisper V3 Turbo for lower-cost STT	High-speed hosted inference	Developer/API account	Lowest-cost hosted Whisper-style transcription in this set	Accuracy/features differ from larger model; free plan has rate limits
Google Cloud Speech-to-Text API No tagline	Audio	Cloud speech API	Google Cloud	Speech-to-Text	$0 base	Usage-based per processed minute	Yes for V1 standard	V1 standard: 60 minutes/month free; then $0.016/min with data logging or $0.024/min without data logging; V2 standard starts $0.016/min	Speech recognition and transcription	STT only	✓	Google Cloud terms apply; data logging choice affects pricing	Google standard/chirp speech recognition models	Batch and realtime modes; dynamic batch $0.003/min for V2 standard	Google Cloud project/account	Google Cloud teams needing STT and cloud billing controls	Free tier details differ between V1/V2 and data logging mode
Google Cloud Text-to-Speech API No tagline	Audio	Cloud speech API	Google Cloud	Text-to-Speech	$0 base	Usage-based per character/token	Yes for several voice classes	Standard/WaveNet 4M chars free; Neural2/Studio/Polyglot 1M free; Chirp 3 HD 1M free; Gemini TTS has no free usage listed	Text-to-speech synthesis	TTS only	✓	Google Cloud terms apply	Standard, WaveNet, Neural2, Chirp 3 HD, Gemini TTS	Cloud TTS latency depends on voice/model	Google Cloud project/account	Broad multilingual TTS with large monthly free character pools	Gemini TTS is token-priced and has no free limit on pricing page
Amazon Polly Pay As You Go No tagline	Audio	Cloud TTS API	Amazon Polly	PAYG	$0 base	Usage-based per 1M characters	Yes for first 12 months	Free tier: 5M Standard, 1M Neural, 500k Long-Form, 100k Generative chars/month for first 12 months; then Standard $4/1M, Neural $16/1M, Generative $30/1M	Text-to-speech and speech marks	TTS only	✓	AWS terms apply	Standard, Neural, Long-Form and Generative voices	Cloud TTS; realtime app latency depends on integration	AWS account/project	Cheap predictable TTS in AWS environments	Free tier is time-limited for new accounts
Amazon Transcribe Pay As You Go No tagline	Audio	Cloud STT API	Amazon Transcribe	PAYG	$0 base	Usage-based per audio minute	Yes for first 12 months	Free tier: 60 minutes/month for 12 months; pay-as-you-go after	Batch and streaming speech transcription	STT only	✓	AWS terms apply	Amazon Transcribe speech recognition	Streaming and batch transcription available	AWS account/project	AWS-native transcription and call analytics workflows	Free tier is small and time-limited; exact paid per-minute rate depends on region/tier
Azure AI Speech Free F0 No tagline	Audio	Cloud speech suite	Azure AI Speech	Free F0	$0	Free Azure Speech tier	✓	5 audio hours/month STT; 0.5M neural TTS chars/month; 5 audio hours speech translation/month	STT, TTS, speech translation and speech services	STT/TTS/translation; custom limits on F0	✓	Microsoft Azure terms apply	Azure neural speech and transcription services	Realtime transcription free hours listed	Azure account/resource	Testing Azure Speech before pay-as-you-go	Free tier has concurrency and batch restrictions; public page hid some PAYG prices in this crawl
Whisper Self-hosted No tagline	Audio	Local STT	Whisper	Open source	$0 software	Free local/self-hosted software	Yes; local unlimited if you provide hardware	No hosted credits; limited by local CPU/GPU and model size	Speech-to-text and translation	STT/ASR; no TTS	Local CLI/library; no hosted API included	MIT-licensed software; privacy stays local, audio data does not need to leave machine	Whisper models including large variants	Hardware-dependent; not realtime by default without extra tooling	Individual/local self-host	Private offline transcription and batch processing	Requires setup, compute and operational maintenance
Piper Self-hosted No tagline	Audio	Local TTS	Piper	Open source	$0 software	Free local/self-hosted software	Yes; local unlimited if you provide hardware	No hosted credits; limited by hardware and selected voices	Text-to-speech	TTS only	Local CLI/library; no hosted API included	MIT-licensed software; voice/model licenses may vary	Fast neural TTS aimed at local use	Fast on local hardware; suitable for offline assistants	Individual/local self-host	Lightweight offline TTS with predictable cost	Voice quality and language coverage depend on available voices
F5-TTS Self-hosted No tagline	Audio	Local TTS / voice cloning	F5-TTS	Open source	$0 software	Free local/self-hosted software	Yes; local unlimited if you provide hardware	No hosted credits; limited by local GPU/CPU and model/license	Text-to-speech and voice cloning/reconstruction workflows	TTS; voice cloning depends on model/use	Local/self-hosted; no official hosted API included	Project/model license terms apply; privacy can stay local	Modern open-source TTS from local resource list	Hardware-dependent generation speed	Individual/local self-host	Experimenting with open-source voice cloning/TTS	Requires model setup and careful rights/consent handling for voice cloning
Bark Self-hosted No tagline	Audio	Local expressive TTS	Bark	Open source	$0 software	Free local/self-hosted software	Yes; local unlimited if you provide hardware	No hosted credits; limited by local hardware	Text-to-audio / expressive speech generation	TTS-like generative audio; no STT	Local/self-hosted; no official hosted API included	License/model terms apply; privacy can stay local	Expressive local generative speech/audio	Slower and more experimental than production APIs	Individual/local self-host	Creative local speech/audio experiments	Less predictable than API TTS; setup and model management required