+44 20 80501535

What is a character set?

A character set is the totality of all characters used to represent information. Characters are, for example, the letters of an alphabet or numbers, but also other symbols such as special characters, pictograms and control characters. In electronic data processing (EDP), the number of characters in a character set is limited by the number of bits.

Need a translation?

Character set in IT

Computers and digital circuits can only store and process the symbols 0 and 1 (binary digits). Therefore, each character is stored in a character string, known as a bit code. There are approximately 100 important characters – including numbers, letters, umlauts, punctuation marks, symbols, special characters, control characters and formula characters – for which 7 bits are sufficient. The character set determines which character corresponds to which bit code. Due to the internationalisation of the Internet, character codes must be standardised in order to ensure a smooth data exchange independent of language.

Development of the character set

The idea of giving meaning to signals evolved early on. With the development of electrical telegraphy in 1837, electrical pulses were used for the first time to transmit the characters. In order to understand the transmitted message, the signals first had to be converted into characters. For this purpose, pointer telegraphs and teleprinters were developed around 1900 that converted signals into legible text. Coding was revolutionised by the French engineer Émile Baudot, who mapped texts as a sequence of five binary digits. The 32 possible signals, combined from 5 keys, had to be entered by the sender themselves – the birth of the first 5-bit character set. Since computers require a larger unit for data processing, the 7-bit character set ASCII was developed in 1963, which was the standard character set in IT for a long time. The first 8-bit character set, EBCDIC, was created at the same time as ASCII and was in use on mainframe computers until recently. It can be used to assign 256 different characters. In order to be able to represent all the languages of the world in one character set, a universal character chart was developed at the end of the 1980s: Unicode.

ASCII, ISO and Unicode…

A PC character set includes not only the individual elements of a character set, but also their rules for encoding. The best-known character encodings are ASCII, the ISO/IEC 8859 family and the internationally standardised Unicode. In addition, some computer company character sets and specific national variants exist.

Load more

FAQ: More questions about character sets

What are the character encodings?

There are three different character encodings for Unicode: UTF-8, UTF-16 and UTF-32.

What is Unicode?

Unicode is the international standard for encoding characters or text elements. The system enables the storage and processing of texts in digital systems.

How many characters does UTF-8 have?

Without Unicode restriction, a whole 4,398,046,511,104 character mappings would be possible with UTF-8. Due to the 4-byte limitation in Unicode, the effective number is 221, which corresponds to 2,097,152 characters.

How do you edit characters that are not on the keyboard?

There are numerous special characters that can be inserted via key combinations. You can look them up at https://tools.oratory.com/altcodes.html, for example.

What are special characters?

Special characters are all letters and numbers beyond the Latin alphabet. These include punctuation marks ( ? ! . , ; : – ), symbols (§ / # $ %), ligatures of two letters (ß æ œ) and letters with so-called diacritical marks (ü á ô è ñ).

What is a character set?

A character set is the set of all characters used to represent information. The character set depends on the display system.

What is character encoding?

In computing, character encoding (also character coding) refers to the process of translating a particular string of characters into a special format.

What is the term for a character set in printing?

In printing, a character set is called a font.

This site is registered on wpml.org as a development site.