【鼠】庚子年癸未月戊午日 / 五月廿五日
Tuesday July 14, 2020

EUC


EUC stands for Extended Unix Code. It is a multibyte encoding standard originally developed by AT&T and supported on all System V implementations used to represent large Asian characters sets. There are several variants, two of them are for Chinese.
It defines both a fixed length and variable length encoding. It's a 8 bit coding method

The structure is based on the ISO 2022 standard. Up to 4 code sets can be defined. The layout is based upon a 94 x 94 grid so each plane set can contain up to 8.836 (94x94) characters.
If codeset 0 is ASCII, then the EUC codeset is ASCII transparent. Often this is the local version of ASCII.
The rules for describing a legal EUC codeset. These rules are the following:
1) Each character of an EUC multibyte string is chosen from among four distinct multibyte codesets (0,1,2,and 3).
2) Codeset 0 must be a 7bit codeset.
3) No multibyte character of Codeset 1 will use either SS2 or SS3 as its first byte.
4) Characters from codeset 2 will be preceded by the byte SS2.
5) Characters from codeset 3 will be preceded by the byte SS3.
6) For codesets 1, 2, and 3, every byte of every character must have the eighth bit set.

EUC-TW

EUC-CN

[ < back ] - [ home ]