tcs – translate character sets

tcs [ –slcv ] [ –f ics ] [ –t ocs ] [ file ... ]

Tcs interprets the named file(s) (standard input default) as a stream of characters from the ics character set or format, converts them to runes, and then converts them into a stream of characters from the ocs character set or format on the standard output. The default value for ics and ocs is utf, the UTF encoding described in utf(6). The –l option lists the character sets known to tcs. Processing continues in the face of conversion errors (the –s option prevents reporting of these errors). The –c option forces the output to contain only correctly converted characters; otherwise, Runeerror (0xFFFD) characters will be substituted for UTF encoding errors and unknown characters.

The –v option generates various diagnostic and summary information on standard error, or makes the –l output more verbose.

Tcs recognizes an ever changing list of character sets. In particular, it supports a variety of Russian and Japanese encodings. Some of the supported encodings are
utf         The Plan 9 UTF encoding, known by ISO as UTF–8
utf1        The deprecated original UTF encoding from ISO 10646
ascii       7–bit ASCII
8859–1      Latin–1 (Central European)
8859–2      Latin–2 (Czech .. Slovak)
8859–3      Latin–3 (Dutch .. Turkish)
8859–4      Latin–4 (Scandinavian)
8859–5      Part 5 (Cyrillic)
8859–6      Part 6 (Arabic)
8859–7      Part 7 (Greek)
8859–8      Part 8 (Hebrew)
8859–9      Latin–5 (Finnish .. Portuguese)
html        Unicode as encoded by HTML
koi8        KOI–8 (GOST 19769–74)
jis–kanji   ISO 2022–JP
ujis        EUC–JX: JIS 0208
ms–kanji    Microsoft, or Shift–JIS
jis         (from only) guesses between ISO 2022–JP, EUC or Shift–Jis
gb          Chinese national standard (GB2312–80)
big5        Big 5 (HKU version)
unicode     Unicode Standard 1.0
tis         Thai character set plus ASCII (TIS 620–1986)
msdos       IBM PC: CP 437
atari       Atari–ST character set
nfd         Unicode Normalization Form D
nfc         Unicode Normalization Form C

tcs –f 8859–1
Convert 8859–1 (Latin–1) characters into UTF format.
tcs –s –f jis
Convert characters encoded in one of several shift JIS encodings into UTF format. Unknown Kanji will be converted into 0xFFFD characters.
tcs –t html
Convert UTF into character set–independent HTML.
tcs –lv
Print an up to date list of the supported character sets.


ascii(1), rune(2), utf(6).