Audio

Text-to-speech, speech-to-text, music, and voice cloning.

Tool
Category
Segment
Platform
Plan
Monthly Price USD
Pricing Model
Free Tier / Trial
Included Credits / Usage
Audio Modes
STT / TTS / Voice Support
API Access
Commercial Rights / Privacy
Models / Quality
Latency / Realtime
Team / Collaboration
Best Fit
Main Limits / Caveats
No tagline
AudioAI voice suiteElevenLabsFree$0Monthly credit plan10k credits/month; about 10 TTS minutes listedTTS, STT, sound effects, voice design, music, productionsTTS and STT included; voice cloning/design limitedNo commercial license on free plan; account terms applyHigh-quality hosted voice modelsStreaming and low-latency options exist, but free is limitedIndividual accountTesting ElevenLabs voice quality before payingCredits reset monthly; commercial use requires paid tier
No tagline
AudioAI voice suiteElevenLabsStarter$6/moMonthly subscription with creditsNo; paid plan30k credits/month; about 30 TTS minutes listedTTS, STT, voice cloning, dubbing, musicTTS/STT plus instant voice cloningCommercial license includedHigh-quality hosted voice modelsAPI and streaming support; credit limits still applyIndividual creator accountLowest paid ElevenLabs tier with commercial licenseSmall credit pool; professional voice cloning starts higher
No tagline
AudioAI voice suiteElevenLabsCreator$22/mo list; $11 first-month promo displayedMonthly subscription with creditsNo; paid plan121k credits/month; about 121 TTS minutes listedTTS, STT, professional voice cloning, dubbingTTS/STT plus professional voice cloningCommercial license included; paid credits can roll over up to two months while activeHigher creator-grade voice quality and cloningGood for creator workflows; 128kbps output listedIndividual creator accountMain creator tier for cloning and recurring contentDisplayed promo can differ from ongoing list price; credits vary by model
No tagline
AudioAI voice suiteElevenLabsPro$99/moMonthly subscription with creditsNo; paid plan600k credits/month; about 600 TTS minutes listedTTS, STT, cloning, studio/API audioHigher-quality API output and creator featuresCommercial license included192kbps and 44.1kHz PCM API output listedHigher production throughput than CreatorIndividual/pro creator accountProfessional narration, apps and voice products needing more quotaStill credit-based; no team seats until Scale
No tagline
AudioAI voice suiteElevenLabsScale$299/moMonthly subscription with team seatsNo; paid plan1.8M credits/month; 3 seats; 3 professional voice clonesTTS, STT, voice cloning, productions, team workspaceFuller production voice suiteCommercial license includedHigh-volume hosted voice modelsTeam/project workflows and higher included credits3 workspace seatsSmall teams producing recurring AI audioBusiness/enterprise features and larger seats cost more
No tagline
AudioRealtime TTS/STT APICartesiaFree$0Monthly included creditsSonic-3.5 TTS about 27 minutes/month; 2 concurrent requests listedTTS, STT and voice agent building blocksSonic TTS; Ink STT on paid plan tableCommercial/data terms depend on Cartesia account termsSonic-3.5 low-latency TTSDesigned for realtime voice; low-latency positioningIndividual developer accountTrying low-latency TTS before committingFree usage is small and concurrency-limited
No tagline
AudioRealtime TTS/STT APICartesiaStarter$5/moMonthly subscription with creditsNo; paid planSonic-3.5 about 133 minutes/month; 3 concurrent requests listedTTS plus basic voice cloning workflowsTTS and selected voice featuresCommercial/data terms depend on Cartesia account termsSonic-3.5; instant voice cloning listedRealtime-oriented TTSIndividual developer accountLow-cost TTS API for prototypes and small productsSTT hours and advanced agent features sit in higher plan areas
No tagline
AudioRealtime TTS/STT APICartesiaPro$49/moMonthly subscription with larger creditsNo; paid planSonic-3.5 about 1,667 minutes/month; 5 concurrent requests listedTTS, voice cloning and voice API workflowsTTS with voice cloning and broader API usageCommercial/data terms depend on Cartesia account termsSonic-3.5 production TTSHigher quota/concurrency for realtime voice appsIndividual/pro developer accountProduction TTS with predictable monthly included minutesExact STT/agent usage may consume credits differently
No tagline
AudioRealtime TTS/STT APICartesiaScale$299/moMonthly subscription with high-volume creditsNo; paid planSonic-3.5 about 10,667 minutes/month; 15 concurrent requests listedHigh-volume TTS/STT/voice agent stackTTS plus broader speech platform featuresCommercial/data terms depend on Cartesia account termsSonic-3.5 and Ink-2 platform modelsHigher concurrency and realtime throughputTeam/production workflowHigh-volume voice products needing predictable TTS minutesCustom enterprise still required for larger terms
No tagline
AudioSpeech APIDeepgramPAYG$0 baseUsage-based API with free credit$200 free credit; then pay-as-you-goSTT, TTS, voice agent API, audio intelligenceNova/Flux STT, Aura TTS, Voice Agent APIDeepgram account terms apply; enterprise terms available separatelyNova-3 STT, Flux STT, Aura TTSRealtime streaming and voice-agent latency supportDeveloper/startup account; up to public-model concurrency listedProduction speech API with strong free creditFree credit is credit-based, not a permanent monthly allowance
No tagline
AudioSpeech APIDeepgramGrowth$4K+/yearAnnual prepaid credits with discountNo; paid annual commitmentPrepaid credits redeemed against usage; up to 20% savings listedSTT, TTS, voice agent API, audio intelligenceSame public model endpoints with higher WSS concurrencyDeepgram account terms applyNova-3/Flux/Aura public modelsHigher WSS concurrency than PAYG on pricing pageGrowing application planTeams with predictable speech volumeAnnual commitment; not a casual monthly plan
No tagline
AudioSpeech-to-text APIAssemblyAIFree$0Free API tierUp to 185 hours pre-recorded audio and 333 hours streaming audio free listedPre-recorded STT, realtime STT, speech understandingSTT-focused; voice agent API also listedAssemblyAI account terms applyUniversal STT modelsRealtime endpoint available on free tierDeveloper accountTesting transcription and speech understanding at meaningful volumeFree tier scope and rate limits can differ from production needs
No tagline
AudioSpeech-to-text APIAssemblyAIPAYG$0 baseUsage-based APIAfter free tierUniversal-3 Pro $0.21/hr; Universal-2 $0.15/hr; Whisper-Streaming $0.30/hr listedPre-recorded STT, realtime STT, speech understanding, voice agentsSTT and speech intelligence APIsAssemblyAI account terms applyUniversal-3 Pro and Universal-2 transcription modelsRealtime STT available; pricing by audio hourDeveloper/product accountAccurate transcription and speech intelligence without subscriptionTTS is not the main product; add-ons may cost extra
No tagline
AudioRealtime voice APIOpenAIGPT-Realtime-2$0 baseUsage-based audio/text/image tokensNo included free tier on pricing pageAudio $32/1M input tokens, $0.40/1M cached input, $64/1M output tokensSpeech-to-speech realtime voice agentsNative realtime voice model with text/image input supportAPI data and privacy governed by OpenAI API termsMost capable OpenAI realtime voice model listedRealtime conversational voice with interruption handlingDeveloper/API accountNative voice agents inside OpenAI stackToken-based audio billing is harder to estimate than per-minute STT/TTS
No tagline
AudioRealtime STT / translation APIOpenAIRealtime Whisper / Translate$0 basePer-minute realtime audio pricingNo included free tier on pricing pageGPT-Realtime-Whisper $0.017/min; GPT-Realtime-Translate $0.034/minLive transcription and live speech translationRealtime STT and translationAPI data and privacy governed by OpenAI API termsOpenAI realtime transcription and translation modelsDesigned for live speech streamsDeveloper/API accountLive captions, translation and voice products using OpenAISeparate from full speech-to-speech GPT-Realtime-2 billing
No tagline
AudioText-to-speech APIOpenAIGPT-4o mini TTS$0 baseUsage-based TTS tokensNo included free tier on model pageText input $0.60/1M tokens; audio output $12/1M tokensTTS generationTTS only for this model rowAPI data and privacy governed by OpenAI API termsSteerable OpenAI TTS modelAPI latency depends on request size and endpointDeveloper/API accountProgrammatic TTS when already using OpenAI APIs2,000 input token maximum listed on model page
No tagline
AudioFast STT APIGroqWhisper V3 Large$0 baseUsage-based API; free console access via limitsFree API key / free plan limits listed separately$0.111/hour on pricing page; audio minimum 10 seconds/requestSpeech-to-text transcription and translationWhisper-family STT on GroqCloudGroqCloud terms apply; model license/terms applyWhisper Large V3 hosted on GroqHigh speed factor listed; very fast transcriptionDeveloper/API accountFast, cheap Whisper transcriptionFree usage is capped by rate limits; not a monthly credit plan
No tagline
AudioFast STT APIGroqWhisper V3 Turbo$0 baseUsage-based API; free console access via limitsFree API key / free plan limits listed separately$0.04/hour on pricing page; audio minimum 10 seconds/requestSpeech-to-text transcription and translationWhisper-family STT on GroqCloudGroqCloud terms apply; model license/terms applyWhisper V3 Turbo for lower-cost STTHigh-speed hosted inferenceDeveloper/API accountLowest-cost hosted Whisper-style transcription in this setAccuracy/features differ from larger model; free plan has rate limits
No tagline
AudioCloud speech APIGoogle CloudSpeech-to-Text$0 baseUsage-based per processed minuteYes for V1 standardV1 standard: 60 minutes/month free; then $0.016/min with data logging or $0.024/min without data logging; V2 standard starts $0.016/minSpeech recognition and transcriptionSTT onlyGoogle Cloud terms apply; data logging choice affects pricingGoogle standard/chirp speech recognition modelsBatch and realtime modes; dynamic batch $0.003/min for V2 standardGoogle Cloud project/accountGoogle Cloud teams needing STT and cloud billing controlsFree tier details differ between V1/V2 and data logging mode
No tagline
AudioCloud speech APIGoogle CloudText-to-Speech$0 baseUsage-based per character/tokenYes for several voice classesStandard/WaveNet 4M chars free; Neural2/Studio/Polyglot 1M free; Chirp 3 HD 1M free; Gemini TTS has no free usage listedText-to-speech synthesisTTS onlyGoogle Cloud terms applyStandard, WaveNet, Neural2, Chirp 3 HD, Gemini TTSCloud TTS latency depends on voice/modelGoogle Cloud project/accountBroad multilingual TTS with large monthly free character poolsGemini TTS is token-priced and has no free limit on pricing page
No tagline
AudioCloud TTS APIAmazon PollyPAYG$0 baseUsage-based per 1M charactersYes for first 12 monthsFree tier: 5M Standard, 1M Neural, 500k Long-Form, 100k Generative chars/month for first 12 months; then Standard $4/1M, Neural $16/1M, Generative $30/1MText-to-speech and speech marksTTS onlyAWS terms applyStandard, Neural, Long-Form and Generative voicesCloud TTS; realtime app latency depends on integrationAWS account/projectCheap predictable TTS in AWS environmentsFree tier is time-limited for new accounts
No tagline
AudioCloud STT APIAmazon TranscribePAYG$0 baseUsage-based per audio minuteYes for first 12 monthsFree tier: 60 minutes/month for 12 months; pay-as-you-go afterBatch and streaming speech transcriptionSTT onlyAWS terms applyAmazon Transcribe speech recognitionStreaming and batch transcription availableAWS account/projectAWS-native transcription and call analytics workflowsFree tier is small and time-limited; exact paid per-minute rate depends on region/tier
No tagline
AudioCloud speech suiteAzure AI SpeechFree F0$0Free Azure Speech tier5 audio hours/month STT; 0.5M neural TTS chars/month; 5 audio hours speech translation/monthSTT, TTS, speech translation and speech servicesSTT/TTS/translation; custom limits on F0Microsoft Azure terms applyAzure neural speech and transcription servicesRealtime transcription free hours listedAzure account/resourceTesting Azure Speech before pay-as-you-goFree tier has concurrency and batch restrictions; public page hid some PAYG prices in this crawl
No tagline
AudioLocal STTWhisperOpen source$0 softwareFree local/self-hosted softwareYes; local unlimited if you provide hardwareNo hosted credits; limited by local CPU/GPU and model sizeSpeech-to-text and translationSTT/ASR; no TTSLocal CLI/library; no hosted API includedMIT-licensed software; privacy stays local, audio data does not need to leave machineWhisper models including large variantsHardware-dependent; not realtime by default without extra toolingIndividual/local self-hostPrivate offline transcription and batch processingRequires setup, compute and operational maintenance
No tagline
AudioLocal TTSPiperOpen source$0 softwareFree local/self-hosted softwareYes; local unlimited if you provide hardwareNo hosted credits; limited by hardware and selected voicesText-to-speechTTS onlyLocal CLI/library; no hosted API includedMIT-licensed software; voice/model licenses may varyFast neural TTS aimed at local useFast on local hardware; suitable for offline assistantsIndividual/local self-hostLightweight offline TTS with predictable costVoice quality and language coverage depend on available voices
No tagline
AudioLocal TTS / voice cloningF5-TTSOpen source$0 softwareFree local/self-hosted softwareYes; local unlimited if you provide hardwareNo hosted credits; limited by local GPU/CPU and model/licenseText-to-speech and voice cloning/reconstruction workflowsTTS; voice cloning depends on model/useLocal/self-hosted; no official hosted API includedProject/model license terms apply; privacy can stay localModern open-source TTS from local resource listHardware-dependent generation speedIndividual/local self-hostExperimenting with open-source voice cloning/TTSRequires model setup and careful rights/consent handling for voice cloning
No tagline
AudioLocal expressive TTSBarkOpen source$0 softwareFree local/self-hosted softwareYes; local unlimited if you provide hardwareNo hosted credits; limited by local hardwareText-to-audio / expressive speech generationTTS-like generative audio; no STTLocal/self-hosted; no official hosted API includedLicense/model terms apply; privacy can stay localExpressive local generative speech/audioSlower and more experimental than production APIsIndividual/local self-hostCreative local speech/audio experimentsLess predictable than API TTS; setup and model management required