简体   繁体   中英

Python Spacy errors when nlp is called: UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2

Python3.6: I am using Spacy on a column of text in a pandas df. The text does have "Special Characters" and I need to keep them. nlp required unicode for some reason. I am getting an error from nlp below:

Any help would be very much appreciated.

# -*- coding: utf-8 -*-
import spacy
nlp = spacy.load("en_core_web_sm")

df['TextCol'] = df['TextCol'].str.encode('utf-8')
def function(row):
    doc = nlp(unicode(text))

df.apply(function, axis=1)

Return from nlp:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 

So I solved my own question. Not really sure what changed I switched IDEs from pycharm to eclipse(pydev). I am still using the same interpreter. Here is the changes, looks pretty standard usage.

# -*- coding: utf-8 -*-
import spacy
nlp = spacy.load("en_core_web_sm")

# Removed encode
# df['TextCol'] = df['TextCol'].str.encode('utf-8')
def function(row):
    # Removed unicode
    doc = nlp(text)

df.apply(function, axis=1)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM