Dialect TTS Getting Started: From Zero to First Voiceover
Beginner-friendly steps to choose a dialect, write scripts, call the API, and generate natural voiceovers.
XiangYinGe Team
What is Dialect TTS?
Dialect Text-to-Speech (TTS) technology converts text into natural speech, specifically optimized for regional dialects. Unlike standard Mandarin TTS, dialect TTS needs to handle unique phonetic features, tonal variations, and linguistic habits.
Why Choose XiangYinGe?
XiangYinGe focuses on AI voice synthesis for Chinese dialects. Our advantages include:
- Wide Coverage: Support for 100+ dialect variants
- Natural Quality: Advanced deep learning models
- Smart Conversion: Automatically convert Mandarin to authentic dialect expressions
- Easy to Use: Standard RESTful API interface
Quick Start
Step 1: Register an Account
Visit our website and click the "Free Trial" button to register with your email.
Step 2: Get Your API Key
After logging in, obtain your API key from the dashboard for authentication.
Step 3: Send Your First Request
import requests
import json
# API Configuration
API_KEY = "your_api_key_here"
API_URL = "https://api.xiangyinge.com/v1/tts"
# Request parameters
data = {
"text": "How much does this cost",
"dialect": "yue", # Cantonese
"voice": "female_1",
"speed": 1.0
}
# Send request
headers = {
"Authorization": f"Bearer {API_KEY}",
"Content-Type": "application/json"
}
response = requests.post(API_URL, json=data, headers=headers)
# Save audio file
if response.status_code == 200:
with open("output.mp3", "wb") as f:
f.write(response.content)
print("Audio generated successfully!")
Supported Dialect Types
Cantonese Series
- Guangzhou Cantonese
- Hong Kong Cantonese
- Macau Cantonese
Sichuan Dialect
- Chengdu Dialect
- Chongqing Dialect
- Zigong Dialect
Wu Chinese Series
- Shanghai Dialect
- Suzhou Dialect
- Hangzhou Dialect
Best Practices
Text Preprocessing
Before sending requests, we recommend preprocessing the text:
- Remove special characters
- Standardize punctuation
- Handle numbers and English text
Choose the Right Voice
Different scenarios suit different voices:
- News Broadcasting: Choose formal, clear voices
- Storytelling: Choose warm, emotional voices
- Commercial Ads: Choose lively, energetic voices
Adjust Speed and Pitch
Adjust parameters based on content type:
{
"speed": 0.9, // Slightly slower for educational content
"pitch": 1.1, // Slightly higher for more energy
"volume": 0.95 // Moderate volume
}
Frequently Asked Questions
How to handle polyphonic characters?
Our system automatically identifies the correct pronunciation based on context. For special cases, you can use SSML markup.
Do you support batch processing?
Yes, we provide batch API endpoints to process multiple texts at once.
What audio formats are available?
We support MP3, WAV, OGG, and other formats, which can be specified in the request.
Advanced Features
SSML Support
Use Speech Synthesis Markup Language (SSML) for fine-grained control over speech output:
<speak>
<prosody rate="slow">
This sentence will be read slower.
</prosody>
<break time="500ms"/>
<emphasis level="strong">
This will be emphasized.
</emphasis>
</speak>
Emotion Control
Control voice emotion through the emotion parameter:
{
"emotion": "happy", // Options: happy, sad, angry, neutral
"emotion_intensity": 0.8
}
Conclusion
Through this tutorial, you've learned how to use XiangYinGe's dialect TTS service. From simple text-to-speech to complex emotion control and SSML markup, our API can meet all your needs.
Start your dialect content creation journey now! If you have any questions, feel free to contact our technical support team.
Further Reading
FAQ
-
Do I need to sign up to use dialect TTS?
You can try the web demo instantly. API usage requires an account and API key.
-
Can Mandarin text be converted to dialect automatically?
Yes, the system supports Mandarin-to-dialect conversion for natural expressions.
-
What scenarios work best?
Short videos, live commerce, audiobooks, customer service, and localized marketing.
-
How can I improve naturalness?
Use segmentation, SSML, emotion settings, and choose a voice suited to the scene.