[英]UTF8 Decoding Chinese Characters
I am using node.js and express to build an API which transforms chinese characters into their phonetic spelling (Pinyin) but I am having some utf8 decoding/encoding issues.我正在使用 node.js 和 express 构建一个 API,该 API 将中文字符转换为拼音(拼音),但我遇到了一些 utf8 解码/编码问题。 My PHP Curl request to this API encodes the characters like this:我对此 API 的 PHP Curl 请求对字符进行了如下编码:
你好 (nǐhǎo) => ä½ å¥½
...so I have to decode them in my node application. ...所以我必须在我的节点应用程序中解码它们。 I am using the following function:我正在使用以下功能:
function decode_utf8(s) {
return decodeURIComponent(escape(s));
}
and it works perfectly fine in most cases.并且在大多数情况下它工作得很好。 However, I noticed some weird behavior.但是,我注意到了一些奇怪的行为。 Here are two inputs, the value after escape()
and the value after decodeURIComponent()
:这里有两个输入, escape()
之后的值和decodeURIComponent()
之后的值:
你好 (nǐhǎo): ä½ å¥½ => %E4%BD%A0%E5%A5%BD => 你好
你 (nǐ): ä½ => %E4%BD => URIError: URI malformed
The first one (nǐhǎo) works, but when only using the first of the two characters (nǐ) it gives me an URIError?第一个 (nǐ hǎo) 有效,但是当只使用两个字符中的第一个 (nǐ ) 时,它会给我一个 URIError? How is this possible?这怎么可能? The input to the decodeURIComponent()
function is the exact same for the nǐ-part but it only works when combining it with another character. decodeURIComponent()
函数的输入与 nǐ -part 完全相同,但仅在将其与另一个字符组合时才起作用。 What's wrong here?这里有什么问题?
The string you're trying to decode doesn't resolve to valid utf8.您尝试解码的字符串无法解析为有效的 utf8。 Something is wrong with your PHP Curl request because it's not encoding你 (nǐ)
correctly.您的 PHP Curl 请求有问题,因为它没有正确编码你 (nǐ)
。 The percent encoded value should be你 => %E4%BD%A0
百分比编码值应该是你 => %E4%BD%A0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.