使用python编码解码

Question

I have this function in python 我在python中有此功能

Str = "Ã¼";
print Str


def correctText( str ):
  str = str.upper()
  correctedText = str.decode('UTF8').encode('Windows-1252')
  return correctedText; 

corText = correctText(Str); 
print corText

It works and converts characters like Ã¼ and Ã© however it fails when i try Ã? 它可以工作并转换¼和Ã等字符，但是当我尝试Ã时却失败了。 and Â¶ 和¶

Is there a way i can fix it? 有办法解决吗？

Answer 1

According to UTF8, Ã and Â¶ are not valid characters, meaning that don't have a number of bytes divisible by 4 (usually). 根据UTF8，Ã和Â¶不是有效字符，这意味着字节数不能被4整除（通常）。 What you need to do is either use some other kind of encoding or strip out errors in your str by using the unicode() function. 您需要做的是使用其他类型的编码，或者通过使用unicode（）函数消除str中的错误。 I recommend using the ladder. 我建议使用梯子。

Answer 2

What you are trying to do is to compose valid UTF-8 codes by several consecutive Windows-1252 codes. 您想要做的是由几个连续的Windows-1252代码组成有效的UTF-8代码。

For example, for Ã¼ , the Windows-1252 code of Ã is C3 and for ¼ it's BC . 例如，对于Ã¼ ，的的Windows 1252代码Ã是C3和¼这是BC 。 Together the code C3BC happens to be the UTF-8 code of ü . 代码C3BC恰好是ü的UTF-8代码。

Now, for Ã? 现在，对于Ã? , the Windows-1252 code is C33F , which is not a valid UTF-8 code (because the second byte does not start with 10 ). ，Windows-1252代码为C33F ，它不是有效的UTF-8代码（因为第二个字节不是以10开头）。

Are you sure this sequence occurs in your text? 您确定此顺序出现在您的文本中吗？ For example, for à , the Windows-1252 decoding of the UTF-8 code (C3A0) is Ã followed by a non-printable character (non-breaking space). 例如，对于à ，UTF-8代码（C3A0）的Windows-1252解码后跟Ã然后是不可打印字符（不间断空格）。 So, if this second character is not printed, the ? 因此，如果第二个字符未打印，则? might be a regular character of the text. 可能是文本的常规字符。

For Â¶ the Windows-1252 encoding is C2B6 . 对于Â¶在Windows-1252编码C2B6 。 Shouldn't it be Ã¶ , for which the Windows-1252 encoding is C3B6 , which equals the UTF-8 code of ö ? 它不应该是Ã¶ ，为此，在Windows 1252编码是C3B6 ，相当于的UTF-8编码ö ？

使用python编码解码

问题描述

2 个解决方案

解决方案1
0 2017-07-05 14:59:41

解决方案2
0 2017-07-07 17:41:09

使用python编码解码

问题描述

2 个解决方案

解决方案1 0 2017-07-05 14:59:41

解决方案2 0 2017-07-07 17:41:09

解决方案1
0 2017-07-05 14:59:41

解决方案2
0 2017-07-07 17:41:09