Skip to main content

SSML Pause & Pronunciation Control

Precise pauses. Perfect pronunciation. Full control.

What Is SSML Control?

SSML (Speech Synthesis Markup Language) gives you precise control over how text-to-speech engines deliver your script. Instead of hoping the engine guesses the right timing, you specify exactly where to pause, how long to wait, and how to pronounce special content like dates, numbers, and addresses.

SoundWorks supports SSML through two features: p-tag pauses for timing control and Say As formatting for pronunciation.

P-Tag Pauses

Insert precise breaks anywhere in your text. SoundWorks uses a simple <p> tag syntax to add pauses of specific durations.

Available pause lengths: 0.5 seconds to 5 seconds in configurable increments.

Use pauses to:

  • Add dramatic timing between sentences
  • Create natural breathing pauses in long narration
  • Separate sections or chapters in audiobook content
  • Give listeners time to absorb complex information
  • Match timing to video scenes in narrated presentations

Say As Formatting

The Say As feature tells the TTS engine how to interpret and pronounce specific content types:

  • Dates: “01/15/2025” read as “January fifteenth, twenty twenty-five” instead of “zero one slash fifteen slash twenty twenty-five”
  • Numbers: “1,500,000” read as “one million five hundred thousand” instead of individual digits
  • Addresses: Street addresses read in natural spoken format
  • Custom formatting: Additional format types supported by the underlying TTS engine

Why Use SSML?

Professional narration quality. Raw TTS often rushes through content without natural pauses. SSML lets you craft timing that sounds like a human narrator who has rehearsed the script.

Correct pronunciation. TTS engines can misread dates, large numbers, abbreviations, and formatted text. Say As ensures they interpret content correctly.

Consistent output. Once you set your SSML markup, every generation produces the same timing and pronunciation. No guesswork, no variability.

How It Works

Step 1: Write your script. Enter or import your text normally.

Step 2: Add SSML tags. Insert pause tags where you want breaks. Use the Say As dialog to format dates, numbers, and other special content.

Step 3: Generate. The TTS engine processes your SSML-marked text and produces speech with the exact timing and pronunciation you specified.

Engine Compatibility

SSML support varies by TTS engine. AWS Polly and IBM Watson have full SSML support. OpenAI TTS and ElevenLabs have limited or no SSML support. Local engines (Silero, IndexTTS2, Qwen3) handle pause tags but may not support all SSML features.

SoundWorks indicates which SSML features are available for your selected engine, so you always know what markup will be respected.

Frequently Asked Questions

Do I need to learn SSML syntax? No. SoundWorks provides visual controls for inserting pauses and formatting content. You do not need to write raw SSML markup.

Does SSML work with local TTS engines? Pause tags work with most engines. Say As formatting support depends on the specific engine. The p-tag pause system works across all engines.

Can I preview SSML-formatted speech before generating the full file? Yes. Generate a short segment to verify your timing and pronunciation settings before processing the entire script.

Ready to get started?

Download SoundWorks Free