简体   繁体   English

UTF8 解码汉字

[英]UTF8 Decoding Chinese Characters

I am using node.js and express to build an API which transforms chinese characters into their phonetic spelling (Pinyin) but I am having some utf8 decoding/encoding issues.我正在使用 node.js 和 express 构建一个 API,该 API 将中文字符转换为拼音(拼音),但我遇到了一些 utf8 解码/编码问题。 My PHP Curl request to this API encodes the characters like this:我对此 API 的 PHP Curl 请求对字符进行了如下编码:

你好 (nǐ​hǎo) => ä½ å¥½

...so I have to decode them in my node application. ...所以我必须在我的节点应用程序中解码它们。 I am using the following function:我正在使用以下功能:

function decode_utf8(s) {
    return decodeURIComponent(escape(s));
}

and it works perfectly fine in most cases.并且在大多数情况下它工作得很好。 However, I noticed some weird behavior.但是,我注意到了一些奇怪的行为。 Here are two inputs, the value after escape() and the value after decodeURIComponent() :这里有两个输入, escape()之后的值和decodeURIComponent()之后的值:

你好 (nǐ​hǎo): ä½ å¥½ => %E4%BD%A0%E5%A5%BD => 你好
你 (nǐ​): ä½ => %E4%BD => URIError: URI malformed

The first one (nǐ​hǎo) works, but when only using the first of the two characters (nǐ​) it gives me an URIError?第一个 (nǐ hǎo) 有效,但是当只使用两个字符中的第一个 (nǐ ) 时,它会给我一个 URIError? How is this possible?这怎么可能? The input to the decodeURIComponent() function is the exact same for the nǐ​-part but it only works when combining it with another character. decodeURIComponent()函数的输入与 nǐ -part 完全相同,但仅在将其与另一个字符组合时才起作用。 What's wrong here?这里有什么问题?

The string you're trying to decode doesn't resolve to valid utf8.您尝试解码的字符串无法解析为有效的 utf8。 Something is wrong with your PHP Curl request because it's not encoding你 (nǐ​) correctly.您的 PHP Curl 请求有问题,因为它没有正确编码你 (nǐ​) The percent encoded value should be你 => %E4%BD%A0百分比编码值应该是你 => %E4%BD%A0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM