Amazon Polly - Text-to-Speech · innFactory - Software Development, Cloud & AI

What is Amazon Polly?

Amazon Polly is a text-to-speech service that converts text into natural-sounding speech. The service offers over 60 voices in more than 30 languages and is suitable for applications, accessibility features, and content creation.

Polly uses deep learning for Neural Text-to-Speech (NTTS) with particularly natural-sounding voices. The simple API enables integration in minutes.

Core Features

Neural Voices: Natural-sounding speech with NTTS technology
30+ Languages: German, English, French, Spanish, and many more
SSML Support: Fine control over pronunciation, pauses, and emphasis
Speech Marks: Timing information for lip-sync and text highlighting
Lexicons: Custom pronunciation dictionaries

Typical Use Cases

Voice Assistants: Speech output for chatbots, IVR systems, and smart home devices. Neural voices ensure natural conversations.

Accessibility: Reading web content, documents, and apps for visually impaired users. WCAG compliance through audio alternatives.

Content Creation: Audio versions of articles, e-learning content, and podcasts. Automated production saves time and costs.

Benefits

Natural-sounding speech with Neural TTS
Pay-per-character without minimum fees
Simple REST API for quick integration
Support for German voices

Integration with innFactory

As an AWS Reseller, innFactory supports you with Amazon Polly: We help with integration into your applications, optimization of speech quality with SSML, and combination with other AWS services like Lex and Connect.

Frequently Asked Questions

What is Amazon Polly?

Amazon Polly is a text-to-speech service that converts text into natural-sounding speech. It offers over 60 voices in more than 30 languages, including neural voices with high speech quality.

What are neural voices?

Neural Text-to-Speech (NTTS) uses deep learning for more natural speech synthesis. Voices sound more human-like with better intonation and emphasis than standard voices.

Which output formats are supported?

MP3, OGG Vorbis, PCM, and JSON with Speech Marks. Speech Marks provide timing information for lip-sync or text highlighting.

How can I customize pronunciation?

SSML tags enable control over pauses, emphasis, pronunciation, and speaking rate. Lexicons store custom pronunciation dictionaries.

Amazon Polly - Text-to-Speech

What is Amazon Polly?

Core Features

Typical Use Cases

Benefits

Integration with innFactory

Typical Use Cases

Frequently Asked Questions

What is Amazon Polly?

What are neural voices?

Which output formats are supported?

How can I customize pronunciation?

Quick Links

AWS Cloud Expertise

Similar Products from Other Clouds

Azure AI Content Understanding - Document Analysis

Azure AI Immersive Reader - Reading Assistance

Recommendations AI - Personalized Recommendations

Azure AI Content Safety - Content Moderation

Azure Open Datasets - Curated Datasets for Machine Learning and Data Analysis

Vertex AI Agent Builder - Enterprise AI Agents

Ready to start with Amazon Polly - Text-to-Speech?