What is Amazon Polly?
Amazon Polly is a text-to-speech service that converts text into natural-sounding speech. The service offers over 60 voices in more than 30 languages and is suitable for applications, accessibility features, and content creation.
Polly uses deep learning for Neural Text-to-Speech (NTTS) with particularly natural-sounding voices. The simple API enables integration in minutes.
Core Features
- Neural Voices: Natural-sounding speech with NTTS technology
- 30+ Languages: German, English, French, Spanish, and many more
- SSML Support: Fine control over pronunciation, pauses, and emphasis
- Speech Marks: Timing information for lip-sync and text highlighting
- Lexicons: Custom pronunciation dictionaries
Typical Use Cases
Voice Assistants: Speech output for chatbots, IVR systems, and smart home devices. Neural voices ensure natural conversations.
Accessibility: Reading web content, documents, and apps for visually impaired users. WCAG compliance through audio alternatives.
Content Creation: Audio versions of articles, e-learning content, and podcasts. Automated production saves time and costs.
Benefits
- Natural-sounding speech with Neural TTS
- Pay-per-character without minimum fees
- Simple REST API for quick integration
- Support for German voices
Integration with innFactory
As an AWS Reseller, innFactory supports you with Amazon Polly: We help with integration into your applications, optimization of speech quality with SSML, and combination with other AWS services like Lex and Connect.
Typical Use Cases
Frequently Asked Questions
What is Amazon Polly?
Amazon Polly is a text-to-speech service that converts text into natural-sounding speech. It offers over 60 voices in more than 30 languages, including neural voices with high speech quality.
What are neural voices?
Neural Text-to-Speech (NTTS) uses deep learning for more natural speech synthesis. Voices sound more human-like with better intonation and emphasis than standard voices.
Which output formats are supported?
MP3, OGG Vorbis, PCM, and JSON with Speech Marks. Speech Marks provide timing information for lip-sync or text highlighting.
How can I customize pronunciation?
SSML tags enable control over pauses, emphasis, pronunciation, and speaking rate. Lexicons store custom pronunciation dictionaries.