简体   繁体   English

无法在Windows中处理字符串

[英]Can't handle strings in windows

I have written a python 2.7 code in linux and it worked fine. 我已经在Linux中编写了python 2.7代码,并且工作正常。

It uses 它用

os.listdir(os.getcwd())

to read folder names as variables and uses them later in some parts. 读取文件夹名称作为变量,并在以后的某些部分中使用它们。

In linux I used simple conversion trick to manually convert the non asci characters into asci ones. 在linux中,我使用了简单的转换技巧将非asci字符手动转换为asci。

str(str(tfile)[0:-4]).replace('\xc4\xb0', 'I').replace("\xc4\x9e", 'G').replace("\xc3\x9c", 'U').replace("\xc3\x87", 'C').replace("\xc3\x96", 'O').replace("\xc5\x9e", 'S') ,str(line.split(";")[0]).replace(" ", "").rjust(13, "0"),a)) 

This approach failed in windows. 此方法在Windows中失败。 I tried 我试过了

udata = str(str(str(tfile)[0:-4])).decode("UTF-8")
asci = udata.encode("ascii","ignore")

Which also failed with following 以下也失败了

DEM¦-RTEPE # at this string error occured

Exception in Tkinter callback
Traceback (most recent call last):
  File "C:\Python27\lib\lib-tk\Tkinter.py", line 1532, in __call__
    return self.func(*args)
  File "C:\Users\benhur.satir\workspace\Soykan\tkinter.py", line 178, in SparisDerle
    udata = str(str(str(tfile)[0:-4])).decode("utf=8")
  File "C:\Python27\lib\encodings\utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xa6 in position 3: invalid start byte

How can I handle such characters in windows? 如何在Windows中处理此类字符?

NOTE:Leaving them UTF causes xlswriter module to fail, so I need to convert them to asci. 注意:将它们保留为UTF会导致xlswriter模块失败,因此我需要将它们转换为asci。 Missing characters are not desirable yet acceptable. 缺少字符是不可取的,但是可以接受的。

Windows does not like UTF8. Windows不喜欢UTF8。 You probably get the folder names in the default system encoding, generally win1252 (a variant of ISO-8859-1). 您可能会使用默认的系统编码获取文件夹名称,通常是win1252(ISO-8859-1的变体)。

That's the reason why you could not find UTF8 characters in the file names. 这就是为什么在文件名中找不到UTF8字符的原因。 By the way the exception says you found a character of code 0xa6 , which in win1252 encoding would be | 顺便说一句,异常说明您找到了一个代码0xa6的字符,在win1252编码中该字符为| .

It does not say exactly what is the encoding on your windows system as it may depends on the localization, but it proves the data is not UTF8 encoded. 它没有确切说明Windows系统上的编码是什么,因为它可能取决于本地化,但是它证明数据不是UTF8编码的。

How about this? 这个怎么样?
You can use this for optional .replace() 您可以将其用于可选的.replace()
In the module of string , there is a set of characters that can be used.. string模块中,有一组可以使用的字符。

>>> import string
>>> string.digits+string.punctuation
'0123456789!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'
>>> 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM