简体   繁体   English

UnicodeDecodeError:在每行的列中应用 function

[英]UnicodeDecodeError: with apply function in column for each row

I have a dataframe and I want to encode each word in my column by using soundex , so I have to use split because Soundex take only the first word我有一个 dataframe 并且我想使用soundex对列中的每个单词进行编码,所以我必须使用 split 因为Soundex只接受第一个单词

then I apply this line of code but I got this error:然后我应用这行代码,但我得到了这个错误:

table['soundex'] = table['name'].apply(lambda x:' '.join([jellyfish.soundex(i) for i in x.split()]))

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xce in position 0: invalid continuation byte

and when I tried to apply it in other columns it works and they all same data type当我尝试将它应用到其他列时它可以工作并且它们都是相同的数据类型

my data source is a database and I have create a name column through cleansing steps I mean it is not original from the data source.我的数据源是一个数据库,我通过清理步骤创建了一个名称列,我的意思是它不是来自数据源的原始数据。

most of the solutions with UnicodeDecodeError coming with read CSV files and in my case I do not know what causes this error大多数带有 UnicodeDecodeError 的解决方案都附带读取 CSV 文件,在我的情况下,我不知道是什么导致了这个错误

random sample of data and expected output:数据的随机样本和预期的 output:

name                       soundex
hospital food              H213 F300
good after noon            G300 A136 N500
hi                         h000

any help?有什么帮助吗?

I have solved it by remove non-English character using this line of code:我已经通过使用这行代码删除非英文字符来解决它:

table.name=table.name.str.encode('ascii', 'ignore').str.decode('ascii')

reference:参考:

https://stackoverflow.com/a/56744855/10718214 https://stackoverflow.com/a/56744855/10718214

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM