Speech synthesis

Speech synthesis is a method for artificially generating speech. A text-to-speech system (TTS) is used, i.e. a device or computer program that converts written text into acoustic signals. Speech synthesis is used, among other things, to help visually impaired people communicate.

The development of speech synthesis

By the end of the 18th century, researchers were already attempting to reproduce human language by machine. In 1937, the American Domer Dudley succeeded for the first time in reconstructing spoken utterances electronically with the help of a vocoder. Synthesis systems with phonetic input were developed in the early 1950s. About 20 years later, the first fully text-driven systems were available. Since then, the technologies have been continuously developed with a particular focus on optimising the system structure and output quality.

Text-to-Speech (TTS) programs

First and foremost, text-to-speech systems were developed to make everyday life easier for people with impairments. Synthesised devices such as computers, watches and dictionaries allow people with visual or reading difficulties to access content they would not otherwise be able to access. For the speech impaired, a speech synthesis system can provide an artificial voice. TTS systems are also used on customer portals, in infotainment or in interaction with machines and robots.

How do TTS systems work?

A text-to-speech system converts written text into speech in a two-step process. For correct pronunciation, the program analyses the input text in the first step from a linguistic point of view before the content is converted into a synthetic speech signal in the second step. The software used to convert writing into speech is called a speech synthesiser.

FAQ: More questions about speech synthesis

What is TTS?

Text to speech (abbreviation: TTS) refers to a method for converting written text into speech. This is a form of speech synthesis.

What does speech synthesis mean?

Speech synthesis is the artificial generation of human speech. Different devices and programs can be used such as TTS software.

What are the approaches to speech synthesis?

To generate speech signals, a distinction is made between two approaches: the rule-based and the lexicon-based approach, which are used in combination in most text-to-speech systems.

What is neural speech synthesis?

Neural speech synthesis refers to a form of speech generation that is constantly being improved through machine learning. To do this, an artificial neural network is created that learns to predict the phonetics of human speech. The result is a more fluid and natural sounding voice.

Where is text to speech used?

While text to speech was initially used mainly to help people with disabilities communicate or to provide them with barrier-free access to content, it can now be used wherever text needs to be converted into speech, e.g. in customer service portals or when using smart devices.

