简体   繁体   English

为什么我不能实现解码函数到字符串?

[英]why I can't implement the decode function to string?

I'm researching a dataset and rerun my coworker's code. 我正在研究数据集并重新运行我的同事的代码。 When tokenizing text data, the code shown below doesn't work on my macbook, however, worked well in my coworker's computer.Here is the code. 当标记文本数据时,下面显示的代码在我的macbook上不起作用,但是,在我的同事的计算机中运行良好。这是代码。

I don't know which version does he have but mine is python3.6. 我不知道他有哪个版本,但我的是python3.6。 Is it the problem of different versions? 这是不同版本的问题吗?

s=title+' '+author+' '+text
 tokens=word_tokenize(s.decode('ascii','ignore').lower())
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-8-e50403f82604> in <module>
     10         flushPrint(m/100)#208
     11     s=title+' '+author+' '+text
---> 12     tokens=word_tokenize(s.decode('ascii','ignore').lower())
     13     tokens = [z for z in tokens if not z in stopset and len(z)>1]
     14     k=[]

AttributeError: 'str' object has no attribute 'decode'

The issue is most probably due to the changes between python2 and python3 问题很可能是由于python2和python3之间的变化

In python2 在python2中

  • '' is of type str and thus supports ''.decode() ''str类型,因此支持''.decode()
  • u'' is of type unicode and thus supports u''.encode() u''unicode类型,因此支持u''.encode()

In python3 this is reversed 在python3中,这是相反的

  • '' is of type unicode and thus supports ''.encode() ''unicode类型,因此支持''.encode()
  • u'' is of type byte and thus supports u''.decode() u''是类型byte ,因此支持u''.decode()

So in your case, depending on the type of your variables you might have to do something like 因此,在您的情况下,根据变量的类型,您可能需要执行类似的操作

s = title + b' ' + author + b' ' + text

of just resort to python 2 :) 只是诉诸python 2 :)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM