Xiangyinge Logo
Back to Blog
Guides & TutorialsBeginnerCantoneseSichuan Dialect

Dialect TTS Getting Started: From Zero to First Voiceover

Beginner-friendly steps to choose a dialect, write scripts, call the API, and generate natural voiceovers.

XiangYinGe Team

XiangYinGe Team

1/15/20244 Reading time

What is Dialect TTS?

Dialect Text-to-Speech (TTS) technology converts text into natural speech, specifically optimized for regional dialects. Unlike standard Mandarin TTS, dialect TTS needs to handle unique phonetic features, tonal variations, and linguistic habits.

Why Choose XiangYinGe?

XiangYinGe focuses on AI voice synthesis for Chinese dialects. Our advantages include:

  • Wide Coverage: Support for 100+ dialect variants
  • Natural Quality: Advanced deep learning models
  • Smart Conversion: Automatically convert Mandarin to authentic dialect expressions
  • Easy to Use: Standard RESTful API interface

Quick Start

Step 1: Register an Account

Visit our website and click the "Free Trial" button to register with your email.

Step 2: Get Your API Key

After logging in, obtain your API key from the dashboard for authentication.

Step 3: Send Your First Request

import requests
import json

# API Configuration
API_KEY = "your_api_key_here"
API_URL = "https://api.xiangyinge.com/v1/tts"

# Request parameters
data = {
    "text": "How much does this cost",
    "dialect": "yue",  # Cantonese
    "voice": "female_1",
    "speed": 1.0
}

# Send request
headers = {
    "Authorization": f"Bearer {API_KEY}",
    "Content-Type": "application/json"
}

response = requests.post(API_URL, json=data, headers=headers)

# Save audio file
if response.status_code == 200:
    with open("output.mp3", "wb") as f:
        f.write(response.content)
    print("Audio generated successfully!")

Supported Dialect Types

Cantonese Series

  • Guangzhou Cantonese
  • Hong Kong Cantonese
  • Macau Cantonese

Sichuan Dialect

  • Chengdu Dialect
  • Chongqing Dialect
  • Zigong Dialect

Wu Chinese Series

  • Shanghai Dialect
  • Suzhou Dialect
  • Hangzhou Dialect

Best Practices

Text Preprocessing

Before sending requests, we recommend preprocessing the text:

  • Remove special characters
  • Standardize punctuation
  • Handle numbers and English text

Choose the Right Voice

Different scenarios suit different voices:

  • News Broadcasting: Choose formal, clear voices
  • Storytelling: Choose warm, emotional voices
  • Commercial Ads: Choose lively, energetic voices

Adjust Speed and Pitch

Adjust parameters based on content type:

{
  "speed": 0.9, // Slightly slower for educational content
  "pitch": 1.1, // Slightly higher for more energy
  "volume": 0.95 // Moderate volume
}

Frequently Asked Questions

How to handle polyphonic characters?

Our system automatically identifies the correct pronunciation based on context. For special cases, you can use SSML markup.

Do you support batch processing?

Yes, we provide batch API endpoints to process multiple texts at once.

What audio formats are available?

We support MP3, WAV, OGG, and other formats, which can be specified in the request.

Advanced Features

SSML Support

Use Speech Synthesis Markup Language (SSML) for fine-grained control over speech output:

<speak>
  <prosody rate="slow">
    This sentence will be read slower.
  </prosody>
  <break time="500ms"/>
  <emphasis level="strong">
    This will be emphasized.
  </emphasis>
</speak>

Emotion Control

Control voice emotion through the emotion parameter:

{
  "emotion": "happy", // Options: happy, sad, angry, neutral
  "emotion_intensity": 0.8
}

Conclusion

Through this tutorial, you've learned how to use XiangYinGe's dialect TTS service. From simple text-to-speech to complex emotion control and SSML markup, our API can meet all your needs.

Start your dialect content creation journey now! If you have any questions, feel free to contact our technical support team.

Further Reading

FAQ

  • Do I need to sign up to use dialect TTS?

    You can try the web demo instantly. API usage requires an account and API key.

  • Can Mandarin text be converted to dialect automatically?

    Yes, the system supports Mandarin-to-dialect conversion for natural expressions.

  • What scenarios work best?

    Short videos, live commerce, audiobooks, customer service, and localized marketing.

  • How can I improve naturalness?

    Use segmentation, SSML, emotion settings, and choose a voice suited to the scene.