Python3.6: I am using Spacy on a column of text in a pandas df. The text does have "Special Characters" and I need to keep them. nlp required unicode for some reason. I am getting an error from nlp below:
Any help would be very much appreciated.
# -*- coding: utf-8 -*-
import spacy
nlp = spacy.load("en_core_web_sm")
df['TextCol'] = df['TextCol'].str.encode('utf-8')
def function(row):
doc = nlp(unicode(text))
df.apply(function, axis=1)
Return from nlp:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2
So I solved my own question. Not really sure what changed I switched IDEs from pycharm to eclipse(pydev). I am still using the same interpreter. Here is the changes, looks pretty standard usage.
# -*- coding: utf-8 -*-
import spacy
nlp = spacy.load("en_core_web_sm")
# Removed encode
# df['TextCol'] = df['TextCol'].str.encode('utf-8')
def function(row):
# Removed unicode
doc = nlp(text)
df.apply(function, axis=1)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.