YixScout

AI Audio Tools

AI tools for voice generation, music creation, podcast cleanup, transcription, and meeting audio.

AI-citable summary

Last reviewed: 2026-06-04 by YixScout editorial team

What are AI Audio Tools?

AI Audio Tools are curated AI products in this directory. AI tools for voice generation, music creation, podcast cleanup, transcription, and meeting audio.

How to choose AI Audio Tools

Start with the task, then compare official availability, pricing, privacy posture, output quality, and alternatives such as ElevenLabs, Fish Audio, Cartesia.

ElevenLabs Fish Audio Cartesia OpenAI TTS

AI Audio Tools

ElevenLabsAn AI voice platform for text-to-speech, voice cloning, dubbing, narration, and multilingual audio generation.

Fish AudioA low-cost text-to-speech platform with open-weights voice cloning from a short sample, fine-grained emotion control, and 80+ language support.

CartesiaAn ultra-low-latency text-to-speech API (Sonic) built for real-time conversational voice agents, billed per character with instant voice cloning.

OpenAI TTSOpenAI's text-to-speech API with preset natural voices and steerable tone, billed per token/character, with no voice cloning.

Azure AI Speech (TTS)Microsoft Azure's enterprise text-to-speech with 100+ languages and locales, neural and HD voices, custom voice options, Speech SDK/REST access, and compliance-grade infrastructure.

Chatterbox (Resemble AI)An open-source (MIT) text-to-speech model family from Resemble AI with voice cloning from a few seconds of audio and competitive quality, free for commercial use.

DeepgramA real-time speech-to-text platform (Nova/Flux) built for low-latency voice agents, with batch and streaming transcription and per-minute pricing.

AssemblyAIA speech-to-text API (Universal-3 Pro, Universal-2, and streaming models) pairing transcription with speech intelligence such as summaries, sentiment, topic detection, and speaker labels.

OpenAI WhisperOpenAI's open-source speech recognition model family supporting 99+ languages, considered the accuracy gold standard and free to self-host.

Google Cloud Speech-to-TextGoogle Cloud's enterprise speech recognition API with broad language coverage, streaming and batch transcription, and Google's infrastructure.

ElevenLabs ScribeElevenLabs' speech-to-text model (Scribe v2) for accurate multilingual transcription and real-time use, complementing its TTS platform.

SunoAn AI music creation platform for generating songs, vocals, instrumentals, and creative audio from prompts.

UdioAn AI music generator for creating songs, instrumental ideas, vocals, and shareable audio experiments.

MurfAn AI voice generator for studio-quality voiceovers, presentations, training videos, ads, and product explainers.

KrispAn AI meeting audio tool for noise cancellation, voice clarity, meeting notes, and call productivity.

Adobe PodcastAdobe's AI audio tool for enhancing speech, cleaning recordings, and improving podcast or voice content quality.

AIVAAn AI music composition platform for scores, instrumentals, and licensing-aware composer workflows.

SOUNDRAWAn AI background-music generator focused on royalty-free commercial tracks, editing, distribution, and API/enterprise paths.

MubertAn AI music API and generation platform positioned around licensed/partner content and commercially safer background generation.

OpenAI Realtime APIOpenAI's realtime audio API for building low-latency voice interactions, live speech conversations, and multimodal agent experiences.

Retell AIA platform for building, testing, deploying, and monitoring inbound and outbound AI phone agents with telephony, tools, and analytics.

Bland AIAn enterprise voice AI platform for building, running, and monitoring inbound and outbound AI phone agents at scale.

Rasa VoiceRasa's enterprise voice experience platform for realtime conversations with turn-taking, interruptions, and ASR/TTS provider control.

InworldA realtime voice and AI character platform with streaming TTS, STT, voice cloning, and API layers for voice-first applications.