简体   繁体   中英

Why is rst2html5 messing up encodings?

I have a Python program which is written in UTF-8 as confirmed by PyCharm and Sublime Text. It prints out the pound character, £ ( 0xC2 0xA3 ), to a reStructured Text:

在此处输入图片说明

Opening the reStructured Text file with PyCharm and Sublime Text it looks fine and both claim it's UTF-8.

The problem comes when I generate HTML out of this file by using rst2html5 , with this command:

 rst2html5 --input-encoding=utf-8 --output-encoding=utf-8 foo.rst > foo.html

The HTML claims to be UTF-8, by means of <meta charset="utf-8" /> , but the pound characters, £ , are now shown as ┬ú . Opening it in Sublime Tex as UTF-8 also shows ┬ú instead of £ . This is the actual data:

在此处输入图片说明

Any ideas what's going on or how to stop it? Does that look like UTF-8 at all?

The generated file starts like this:

在此处输入图片说明

0xFF 0xFE reminds me of the UTF-16 BOM but setting the header to <meta charset="utf-16" /> does not solve the problem and telling a text editor to open the file as UTF-16 still shows the non ASCII character broken.

In case it is relevant, my active Windows code page is 437.

The problem was being cause by PowerShell redirection and not by rst2html5 itself. Running it like this:

 rst2html5 --input-encoding=utf-8 --output-encoding=utf-8 foo.rst foo.html

which has the same effect as the redirection ( > ) one worked well, and using the redirection on on CMD also worked well.

If someone has more information about why PowerShell is messing up the encoding, that'd be good to add here.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM