如何在python中解码列表中的字节？

Question

我使用python 2.7.8，我尝试使用称为stem（param）的内置函数来获取单词的起源/根，但是我使用的列表是十六进制的，当我运行程序时出现错误。 这是代码：

    from nltk.stem.isri import ISRIStemmer
    st = ISRIStemmer() 
    f=open("Hassan.txt","rU")
    text=f.read()
    text1=text.split()
    for i in range(1,numOfWords): #numOfWords is var that contain the num of 
          print st.stem(text1[i])    # words in list (text1)

输出如下：

    Warning (from warnings module):
    File "C:\Python27\lib\site-packages\nltk\stem\isri.py", line 154
    if token in self.stop_words:
    UnicodeWarning: Unicode equal comparison failed to convert both 
    arguments to Unicode - interpreting them as being unequal

    Traceback (most recent call last):
    File "C:\Python27\Lib\mycorpus.py", line 81, in <module>
    print st.stem(text1[i])
    File "C:\Python27\lib\site-packages\nltk\stem\isri.py", line 156, in 
    stem
    token = self.pre32(token)     # remove length three and length two 
    prefixes in this order
    File "C:\Python27\lib\site-packages\nltk\stem\isri.py", line 198, in 
    pre32
    if word.startswith(pre3):
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xc8 in position 0: 
    ordinal not in range(128)

我怎么解决这个问题？！

Answer 1

您需要解码文件中的文本。 假设您的文件编码为UTF-8：

text=f.read().decode('utf-8')

如何在python中解码列表中的字节？

问题描述

1 个解决方案

解决方案1
1 已采纳 2018-02-18 07:33:35

如何在python中解码列表中的字节？

问题描述

1 个解决方案

解决方案1 1 已采纳 2018-02-18 07:33:35

解决方案1
1 已采纳 2018-02-18 07:33:35