[英]encode and decode - Hebrew string look like gibberish - Python 3 from Teradata
I connected to my Teradata DB and extract simple query result to a dataframe.我连接到我的 Teradata 数据库并将简单查询结果提取到 dataframe。 The dataframe (df_clients) has 1 column (FirstName) and 1 row with Hebrew string ('שלום לך') as I print the dataframe I get gibberish instead of Hebrew language: ùìåí ìê
dataframe (df_clients) 有 1 列 (FirstName) 和 1 行带有希伯来语字符串 ('שלום לך'),因为我打印 dataframe 我得到胡言乱语而不是希伯来语:ùìåí
I find a solution to encode and decode string:我找到了编码和解码字符串的解决方案:
strr = "ùìåí ìê"
print( strr) #not good
print( strr.encode('cp1252').decode('cp1255',errors='replace')) #good
it worked.. but when I tried the same solution with Pandas Dataframe it doesn't work (no error, but don't work):它有效..但是当我尝试使用 Pandas Dataframe 相同的解决方案时它不起作用(没有错误,但不起作用):
df_clients.FirstName.apply(lambda x : x.encode('cp1252').decode('cp1255',errors='replace') )
The equivalent of your string encoding in Pandas is .str.<de|encode>
. Pandas 中的字符串编码的等效项是
.str.<de|encode>
。 I have generated example of the same number as demo:我生成了与演示相同数量的示例:
import pandas as pd
dataf = pd.DataFrame({
'name':["ùìåí ìê", "ùìåí ìê"]
})
dataf["name"].str.encode('cp1252').str.decode('cp1255',errors='replace')
# result
# 0 שלום לך
# 1 שלום לך
[assign and] a solution that can handle all rows[|columns].
[分配和]一个可以处理所有行[|列]的解决方案。
import pandas as pd
dataf = pd.DataFrame({
'first_name':["ùìåí ìê", "ùìåí ìê"], 'last_name': ["ùìåí ìê", "ùìåí ìê"]
})
# all columns [assuming all are text]
dataf = dataf.transform(lambda x: x.str.encode('cp1252', ,errors='replace').str.decode('cp1255', errors='replace'))
# or selecting subset of columns
dataf[["first_name","last_name"]] = dataf[["first_name","last_name"]].transform(lambda x: x.str.encode('cp1252', errors='replace' ).str.decode('cp1255',errors='replace'))
dataf
# first_name last_name
# 0 שלום לך שלום לך
# 1 שלום לך שלום לך
If there are errors, try set errors='ignore'
and see what worked and which characters are still gibberish.如果有错误,请尝试设置
errors='ignore'
并查看哪些有效,哪些字符仍然是乱码。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.