[英]How do I find the character encoding for a file?
I have an XML that does not include the encoding (charset / Character encoding / character set / character map / codeset / code page). 我有一个不包含编码的XML(字符集/字符编码/字符集/字符映射/代码集/代码页)。 This is an example for one that does:
这是一个执行以下操作的示例:
<?xml version="1.0" encoding="UTF-8"?>
The XML is being generated by a Perl script and the following is an excerpt: XML是由Perl脚本生成的,以下是摘录:
$fileName = $exportDirectory . $fileName;
open FILE, ">$fileName" or die;
The questions: 问题:
I tried to use LibXML: 我尝试使用LibXML:
perl -MXML::LibXML -e 'XML::LibXML->load_xml(location => "2.xml")' 2.xml:1364531: parser error : Input is not proper UTF-8, indicate encoding ! perl -MXML :: LibXML -e'XML :: LibXML-> load_xml(location =>“ 2.xml”)'2.xml:1364531:解析器错误:输入的UTF-8输入不正确,表示编码! Bytes: 0xBF 0x30 0x39 0x20 female presented in spring 09 due t ^
字节:0xBF 0x30 0x39 0x20母头在春季 09到期t ^
I hope I supplied sufficient information. 我希望我提供了足够的信息。 Please let me know if further information is needed.
请让我知道是否需要更多信息。
You can use enca or chardet . 您可以使用enca或chardet 。
You may have to compile enca yourself. 您可能需要自己编译enca。 As for chardet, there's a chance your repo contains a packaged script.
至于chardet,您的回购有可能包含打包的脚本。
Enca works only for European languages and requires you to tell it which language the file is in. Chardet does worse in differentiating European languages encoded with 8-bit encodings, but performs better with non-European text. Enca仅适用于欧洲语言,并且要求您告诉文件文件所用的语言。Chardet在区分使用8位编码编码的欧洲语言时表现较差,但在非欧洲文本中则表现更好。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.