YixScout

AI Audio Tools

AI tools for voice generation, music creation, podcast cleanup, transcription, and meeting audio.

AI-citable summary
Last reviewed: 2026-06-04 by YixScout editorial team

What are AI Audio Tools?

AI Audio Tools are curated AI products in this directory. AI tools for voice generation, music creation, podcast cleanup, transcription, and meeting audio.

How to choose AI Audio Tools

Start with the task, then compare official availability, pricing, privacy posture, output quality, and alternatives such as ElevenLabs, Fish Audio, Cartesia.

AI Audio Tools
ElevenLabs logoElevenLabsAn AI voice platform for text-to-speech, voice cloning, dubbing, narration, and multilingual audio generation.Fish Audio logoFish AudioA low-cost text-to-speech platform with open-weights voice cloning from a short sample, fine-grained emotion control, and 80+ language support.Cartesia logoCartesiaAn ultra-low-latency text-to-speech API (Sonic) built for real-time conversational voice agents, billed per character with instant voice cloning.OpenAI TTS logoOpenAI TTSOpenAI's text-to-speech API with preset natural voices and steerable tone, billed per token/character, with no voice cloning.Azure AI Speech (TTS) logoAzure AI Speech (TTS)Microsoft Azure's enterprise text-to-speech with 100+ languages and locales, neural and HD voices, custom voice options, Speech SDK/REST access, and compliance-grade infrastructure.Chatterbox (Resemble AI) logoChatterbox (Resemble AI)An open-source (MIT) text-to-speech model family from Resemble AI with voice cloning from a few seconds of audio and competitive quality, free for commercial use.Deepgram logoDeepgramA real-time speech-to-text platform (Nova/Flux) built for low-latency voice agents, with batch and streaming transcription and per-minute pricing.AssemblyAI logoAssemblyAIA speech-to-text API (Universal-3 Pro, Universal-2, and streaming models) pairing transcription with speech intelligence such as summaries, sentiment, topic detection, and speaker labels.OpenAI Whisper logoOpenAI WhisperOpenAI's open-source speech recognition model family supporting 99+ languages, considered the accuracy gold standard and free to self-host.Google Cloud Speech-to-Text logoGoogle Cloud Speech-to-TextGoogle Cloud's enterprise speech recognition API with broad language coverage, streaming and batch transcription, and Google's infrastructure.ElevenLabs Scribe logoElevenLabs ScribeElevenLabs' speech-to-text model (Scribe v2) for accurate multilingual transcription and real-time use, complementing its TTS platform.Suno logoSunoAn AI music creation platform for generating songs, vocals, instrumentals, and creative audio from prompts.Udio logoUdioAn AI music generator for creating songs, instrumental ideas, vocals, and shareable audio experiments.Murf logoMurfAn AI voice generator for studio-quality voiceovers, presentations, training videos, ads, and product explainers.Krisp logoKrispAn AI meeting audio tool for noise cancellation, voice clarity, meeting notes, and call productivity.Adobe Podcast logoAdobe PodcastAdobe's AI audio tool for enhancing speech, cleaning recordings, and improving podcast or voice content quality.AIVA logoAIVAAn AI music composition platform for scores, instrumentals, and licensing-aware composer workflows.SOUNDRAW logoSOUNDRAWAn AI background-music generator focused on royalty-free commercial tracks, editing, distribution, and API/enterprise paths.Mubert logoMubertAn AI music API and generation platform positioned around licensed/partner content and commercially safer background generation.OpenAI Realtime API logoOpenAI Realtime APIOpenAI's realtime audio API for building low-latency voice interactions, live speech conversations, and multimodal agent experiences.Retell AI logoRetell AIA platform for building, testing, deploying, and monitoring inbound and outbound AI phone agents with telephony, tools, and analytics.Bland AI logoBland AIAn enterprise voice AI platform for building, running, and monitoring inbound and outbound AI phone agents at scale.Rasa Voice logoRasa VoiceRasa's enterprise voice experience platform for realtime conversations with turn-taking, interruptions, and ASR/TTS provider control.Inworld logoInworldA realtime voice and AI character platform with streaming TTS, STT, voice cloning, and API layers for voice-first applications.