[英]Python Spacy errors when nlp is called: UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2
Python3.6: I am using Spacy on a column of text in a pandas df. Python3.6:我在pandas df中的一列文字上使用Spacy。 The text does have "Special Characters" and I need to keep them.
文本中确实包含“特殊字符”,我需要保留它们。 nlp required unicode for some reason.
nlp由于某种原因需要unicode。 I am getting an error from nlp below:
我从下面的nlp得到一个错误:
Any help would be very much appreciated. 任何帮助将不胜感激。
# -*- coding: utf-8 -*-
import spacy
nlp = spacy.load("en_core_web_sm")
df['TextCol'] = df['TextCol'].str.encode('utf-8')
def function(row):
doc = nlp(unicode(text))
df.apply(function, axis=1)
Return from nlp: 从nlp返回:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2
So I solved my own question. 所以我解决了我自己的问题。 Not really sure what changed I switched IDEs from pycharm to eclipse(pydev).
不太确定发生了什么变化,我将IDE从pycharm切换到eclipse(pydev)。 I am still using the same interpreter.
我仍在使用相同的解释器。 Here is the changes, looks pretty standard usage.
这是更改,看起来很标准。
# -*- coding: utf-8 -*-
import spacy
nlp = spacy.load("en_core_web_sm")
# Removed encode
# df['TextCol'] = df['TextCol'].str.encode('utf-8')
def function(row):
# Removed unicode
doc = nlp(text)
df.apply(function, axis=1)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.