UTF is the abbrivation of UCS Transformation Format (ISO 10646 standard), alternatively it's also called Unicode Transformation Format. UTF-16 is the standard encoding for Unicode.
UTF-7Is described in RFC 2152 (UTF-7: A Mail-Safe Transformation Format of Unicode).
It's a 7-bit encoding method.
Is developed to use in mail environment and uses a shift sequence
+ : start encoding
- : stop encoding
Example: Hi Mom ☺!
|hex||0048 0069 0020 004D 006F 006D 0020||263A||0021|
UTF-8Is described in RFC 2279 (UTF-8, a transformation format of ISO 10646). It was formerly known as UTF-2.
It's a 8-bit variable length encoding method. All unicode characters with a value smaller then 128 are transmitted as is, the rest are encoded. Since it's interpreted as a sequence of bytes, there is no endian problem (these are problems for encoding forms that use 16-bit or 32-bit code units). If there is a BOM used, it's only used to distinguish UTF-8 from other UTF encodings and has nothing to do with the byte order.
Example: 日 本 語
|utf-8||E6 97 A5||E6 9C AC||E8 AA 9E|
UTF-16Is described in rfc2781 (UTF-16, an encoding of ISO 10646).
characters are represented using either one or two unsigned 16-bit integers, depending on the character value. All characters represented in UTF-16 can be represented as a single 32-bit unit in UTF-32
The UTF-16 sequence
  [D800] [DC00]  
is mapped to UCS-4 as
[0000 0048] [0000 0069] [0001 0000] [0000 0021] [0000 0021]
and represents "Hi<0001 0000>!!".
UTF-32Encodes a unicode code point as a sequence of 4 bytes. This in either big-endian or little-endian format. An initial sequence corresponding to U+FEFF is interpreted as a BOM (byte order mark), it is used to distinguish between the two byte orders. The BOM is not considered part of the content of the text.
UTF-32 was originally specified as Unicode Standard Annex #19 : UTF-32. However is now incorporated into the core specification of the Unicode standard.
|UTF-8 (hex)||42 65 69 6A 69 6E 67||E5 8C 97 E4 BA AC|
|UTF-16 (hex)||0042 0065 0069 006A 0069 006E 0067||5317 4EAC|
[ < back ] - [ home ]