The groff
command’s -k option calls the
preconv
preprocessor to perform input character encoding
conversions. Input to the GNU troff
formatter itself, on the
other hand, must be in one of two encodings it can recognize.
cp1047
¶The code page 1047 input encoding works only on EBCDIC platforms (and conversely, the other input encodings don’t work with EBCDIC); the file cp1047.tmac is loaded at startup.
latin1
¶ISO Latin-1, an encoding for Western European languages, is the default input encoding on non-EBCDIC platforms; the file latin1.tmac is loaded at startup.
Any document that is encoded in ISO 646:1991 (a descendant of USAS X3.4-1968 or “US-ASCII”), or, equivalently, uses only code points from the “C0 Controls” and “Basic Latin” parts of the Unicode character set is also a valid ISO Latin-1 document; the standards are interchangeable in their first 128 code points.29
The remaining encodings require support that is not built-in to the GNU
troff
executable; instead, they use macro packages.
latin2
¶To use ISO Latin-2, an encoding for Central and Eastern European
languages, either use ‘.mso latin2.tmac’ at the very beginning
of your document or use ‘-mlatin2’ as a command-line argument to
groff
.
latin5
¶To use ISO Latin-5, an encoding for the Turkish language, either use
‘.mso latin5.tmac’ at the very beginning of your document or
use ‘-mlatin5’ as a command-line argument to groff
.
latin9
¶ISO Latin-9 is a successor to Latin-1. Its main difference from
Latin-1 is that Latin-9 contains the Euro sign. To use this
encoding, either use ‘.mso latin9.tmac’ at the very beginning
of your document or use ‘-mlatin9’ as a command-line argument to
groff
.
Some characters from an input encoding may not be available with a particular output driver, or their glyphs may not have representation in the font used. For terminal devices, fallbacks are defined, like ‘EUR’ for the Euro sign and ‘(C)’ for the copyright sign. For typesetter devices, you may need to “mount” fonts that support glyphs required by the document. See Font Positions.
Due to the importance of the Euro glyph in Europe, groff
is
distributed with a PostScript font called freeeuro.pfa, which
provides various glyph shapes for the Euro. Because standard PostScript
fonts contain the other glyphs from Latin-5 and Latin-9 that
Latin-1 lacks, these encodings are supported for the ps and
pdf output devices as groff
ships, while Latin-2 is
not.
Unicode supports characters from all other input encodings; the utf8 output driver for terminals therefore does as well. The DVI output driver supports the Latin-2 and Latin-9 encodings if the command-line option -mec is used as well. 30