character encodingASCII character set. In the case of ASCII, the character encoding is an identity mapping: code position 65 maps to the byte value 65. This is possible because ASCII uses only code positions representable as single bytes, i.e., values between 0 and 255. (US-ASCII only uses values 0 to 127, in fact.) From the late 1990s, there was increased use of larger character sets such as Unicode and many CJK coded character sets. These can represent characters from many languages and more symbols. Unicode uses many more than the 256 code positions that can be represented by one byte. It thus requires more complex mappings: sometimes the characters are mapped onto pairs of bytes (see DBCS). In many cases, this breaks programs that assume a one-to-one mapping of bytes to characters, and so, for example, treat any occurrance of the byte value 13 as a carriage return. To avoid this problem, character encodings such as UTF-8 were devised.
Last updated: 2015-11-29