nroff / groff无法正确转换utf-8编码的文件

Question

I am having a utf-8 encoded roff-file that I want to convert to a manpage with 我有一个utf-8编码的roff文件，我想将其转换为联机帮助页

$ nroff -mandoc inittab.5

However, characters in [äöüÄÖÜ] , eg are not displayed properly as it seems that nroff assumes ISO 8859-1 encoding (I am getting [Ã¤Ã¶Ã¼ÃÃÃ ] instead. Calling nroff with the -Tutf8 flag does not change the behaviour and the locale environment variables are (I assume properly) set to 但是， [äöüÄÖÜ]字符未正确显示，因为nroff似乎采用了ISO 8859-1编码（我得到的是[Ã¤Ã¶Ã¼ÃÃÃ ]。使用-Tutf8标志调用nroff不会改变行为，并且区域设置环境变量（我假设正确）设置为

LANG=de_DE.utf8
LC_CTYPE="de_DE.utf8"
LC_NUMERIC="de_DE.utf8"
LC_TIME="de_DE.utf8"
LC_COLLATE="de_DE.utf8"
LC_MONETARY="de_DE.utf8"
LC_MESSAGES="de_DE.utf8"
LC_PAPER="de_DE.utf8"
LC_NAME="de_DE.utf8"
LC_ADDRESS="de_DE.utf8"
LC_TELEPHONE="de_DE.utf8"
LC_MEASUREMENT="de_DE.utf8"
LC_IDENTIFICATION="de_DE.utf8"
LC_ALL=

Since nroff is only a wrapper-script and eventually calles groff I checked the call to the latter which is: 由于nroff只是一个包装脚本，最终调用groff我检查了对后者的调用：

$ groff -Tutf8 -mandoc inittab.5

Comparing the byte-encodings of characters in the src file and the output file I am getting the following conversions: 比较src文件和输出文件中字符的字节编码，我得到以下转换：

character  src file  output file
---------  --------  -----------
ä          C3 A4     C3 83 C2 A4
ö          C3 B6     C3 83 C2 B6
ü          C3 BC     C3 83 C2 BC
Ä          C3 84     C3 83
Ö          C3 96     C3 83
Ü          C3 9C     C3 83
ß          C3 9F     C3 83

This behaviour seems very weird to me (why am I getting an additional C3 83 and have the original byte-sequence truncated alltogether for big umlauts and ß ?) 这种行为对我来说似乎很奇怪（为什么我要再得到一个C3 83并把原来的字节序列全部截断以表示大的变音符和ß ？）

Why is this and how can I make nroff / groff properly convert my utf-8 encoded file? 为什么会这样，如何使nroff / groff正确转换我的utf-8编码文件？

EDIT: I am using GNU nroff (groff) version 1.22.2 编辑：我正在使用GNU nroff (groff) version 1.22.2

Answer 1

Unlike other troff implementations (namely Plan 9 and Heirloom troff), groff does not support UTF8 in documents. 与其他troff实现（即Plan 9和Heirloom troff）不同，groff在文档中不支持UTF8。 However, UTF8 output can be achieved using the preconv(1) pre-processor, which converts UTF8 characters in a file to groff native escape sequences. 但是，可以使用preconv(1)预处理器实现UTF8输出，该预处理器将文件中的UTF8字符转换为groff本机转义序列。

Take for example this groff_ms(7) document: 以这个groff_ms(7)文档为例：

.TL
StackOverflow Test Document
.AU
ToasterKing
.PP
I like going to the café down the street

äöüÄÖÜ

Using groff normally, we get: 通常使用groff ，我们得到：

                StackOverflow Test Document


                        ToasterKing


     I like going to the cafÃ© down the street

Ã¤Ã¶Ã¼ÃÃÃ

But when using preconv | groff 但是，当使用preconv | groff preconv | groff or groff -k , we get: preconv | groff或groff -k ，我们得到：

                StackOverflow Test Document


                        ToasterKing


     I like going to the café down the street

äöüÄÖÜ

Looking at the output of preconv , you can see how it transforms characters into escape sequences: 查看preconv的输出，您可以看到它如何将字符转换为转义序列：

.lf 1 so.ms
.TL
StackOverflow Test Document
.AU
ToasterKing
.PP
I like going to the caf\[u00E9] down the street

\[u00E4]\[u00F6]\[u00FC]\[u00C4]\[u00D6]\[u00DC]

nroff / groff无法正确转换utf-8编码的文件

问题描述

1 个解决方案

解决方案1
0 2018-12-05 06:24:37

nroff / groff无法正确转换utf-8编码的文件

问题描述

1 个解决方案

解决方案1 0 2018-12-05 06:24:37

解决方案1
0 2018-12-05 06:24:37