[英]nroff/groff does not properly convert utf-8 encoded file
I am having a utf-8 encoded roff-file that I want to convert to a manpage with 我有一个utf-8编码的roff文件,我想将其转换为联机帮助页
$ nroff -mandoc inittab.5
However, characters in [äöüÄÖÜ]
, eg are not displayed properly as it seems that nroff assumes ISO 8859-1 encoding (I am getting [äöüÃÃÃ
] instead. Calling nroff
with the -Tutf8
flag does not change the behaviour and the locale environment variables are (I assume properly) set to 但是,
[äöüÄÖÜ]
字符未正确显示,因为nroff似乎采用了ISO 8859-1编码(我得到的是[äöüÃÃÃ
]。使用-Tutf8
标志调用nroff
不会改变行为,并且区域设置环境变量(我假设正确)设置为
LANG=de_DE.utf8
LC_CTYPE="de_DE.utf8"
LC_NUMERIC="de_DE.utf8"
LC_TIME="de_DE.utf8"
LC_COLLATE="de_DE.utf8"
LC_MONETARY="de_DE.utf8"
LC_MESSAGES="de_DE.utf8"
LC_PAPER="de_DE.utf8"
LC_NAME="de_DE.utf8"
LC_ADDRESS="de_DE.utf8"
LC_TELEPHONE="de_DE.utf8"
LC_MEASUREMENT="de_DE.utf8"
LC_IDENTIFICATION="de_DE.utf8"
LC_ALL=
Since nroff
is only a wrapper-script and eventually calles groff
I checked the call to the latter which is: 由于
nroff
只是一个包装脚本,最终调用groff
我检查了对后者的调用:
$ groff -Tutf8 -mandoc inittab.5
Comparing the byte-encodings of characters in the src file and the output file I am getting the following conversions: 比较src文件和输出文件中字符的字节编码,我得到以下转换:
character src file output file
--------- -------- -----------
ä C3 A4 C3 83 C2 A4
ö C3 B6 C3 83 C2 B6
ü C3 BC C3 83 C2 BC
Ä C3 84 C3 83
Ö C3 96 C3 83
Ü C3 9C C3 83
ß C3 9F C3 83
This behaviour seems very weird to me (why am I getting an additional C3 83
and have the original byte-sequence truncated alltogether for big umlauts and ß
?) 这种行为对我来说似乎很奇怪(为什么我要再得到一个
C3 83
并把原来的字节序列全部截断以表示大的变音符和ß
?)
Why is this and how can I make nroff
/ groff
properly convert my utf-8 encoded file? 为什么会这样,如何使
nroff
/ groff
正确转换我的utf-8编码文件?
EDIT: I am using GNU nroff (groff) version 1.22.2
编辑:我正在使用
GNU nroff (groff) version 1.22.2
Unlike other troff implementations (namely Plan 9 and Heirloom troff), groff does not support UTF8 in documents. 与其他troff实现(即Plan 9和Heirloom troff)不同,groff在文档中不支持UTF8。 However, UTF8 output can be achieved using the
preconv(1)
pre-processor, which converts UTF8 characters in a file to groff native escape sequences. 但是,可以使用
preconv(1)
预处理器实现UTF8输出,该预处理器将文件中的UTF8字符转换为groff本机转义序列。
Take for example this groff_ms(7)
document: 以这个
groff_ms(7)
文档为例:
.TL
StackOverflow Test Document
.AU
ToasterKing
.PP
I like going to the café down the street
äöüÄÖÜ
Using groff
normally, we get: 通常使用
groff
,我们得到:
StackOverflow Test Document
ToasterKing
I like going to the café down the street
äöüÃÃÃ
But when using preconv | groff
但是,当使用
preconv | groff
preconv | groff
or groff -k
, we get: preconv | groff
或groff -k
,我们得到:
StackOverflow Test Document
ToasterKing
I like going to the café down the street
äöüÄÖÜ
Looking at the output of preconv
, you can see how it transforms characters into escape sequences: 查看
preconv
的输出,您可以看到它如何将字符转换为转义序列:
.lf 1 so.ms
.TL
StackOverflow Test Document
.AU
ToasterKing
.PP
I like going to the caf\[u00E9] down the street
\[u00E4]\[u00F6]\[u00FC]\[u00C4]\[u00D6]\[u00DC]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.