编码和解码 - 希伯来语字符串看起来像乱码 - Teradata 的 Python 3

Question

I connected to my Teradata DB and extract simple query result to a dataframe.我连接到我的 Teradata 数据库并将简单查询结果提取到 dataframe。 The dataframe (df_clients) has 1 column (FirstName) and 1 row with Hebrew string ('שלום לך') as I print the dataframe I get gibberish instead of Hebrew language: ùìåí ìê dataframe (df_clients) 有 1 列 (FirstName) 和 1 行带有希伯来语字符串 ('שלום לך')，因为我打印 dataframe 我得到胡言乱语而不是希伯来语：ùìåí

I find a solution to encode and decode string:我找到了编码和解码字符串的解决方案：

strr = "ùìåí ìê"

print( strr) #not good

print( strr.encode('cp1252').decode('cp1255',errors='replace')) #good

it worked.. but when I tried the same solution with Pandas Dataframe it doesn't work (no error, but don't work):它有效..但是当我尝试使用 Pandas Dataframe 相同的解决方案时它不起作用（没有错误，但不起作用）：

df_clients.FirstName.apply(lambda x : x.encode('cp1252').decode('cp1255',errors='replace') )

Answer 1

The equivalent of your string encoding in Pandas is .str.<de|encode> . Pandas 中的字符串编码的等效项是.str.<de|encode> 。 I have generated example of the same number as demo:我生成了与演示相同数量的示例：

import pandas as pd

dataf = pd.DataFrame({
    'name':["ùìåí ìê", "ùìåí ìê"]
})

dataf["name"].str.encode('cp1252').str.decode('cp1255',errors='replace')

# result
# 0    שלום לך
# 1    שלום לך

Update更新

[assign and] a solution that can handle all rows[|columns]. [分配和]一个可以处理所有行[|列]的解决方案。

import pandas as pd

dataf = pd.DataFrame({
    'first_name':["ùìåí ìê", "ùìåí ìê"], 'last_name': ["ùìåí ìê", "ùìåí ìê"]
})

# all columns [assuming all are text]

dataf = dataf.transform(lambda x: x.str.encode('cp1252', ,errors='replace').str.decode('cp1255', errors='replace'))

# or selecting subset of columns

dataf[["first_name","last_name"]] = dataf[["first_name","last_name"]].transform(lambda x: x.str.encode('cp1252', errors='replace' ).str.decode('cp1255',errors='replace'))

dataf

#    first_name last_name
# 0 שלום לך שלום לך
# 1 שלום לך שלום לך

If there are errors, try set errors='ignore' and see what worked and which characters are still gibberish.如果有错误，请尝试设置errors='ignore'并查看哪些有效，哪些字符仍然是乱码。

编码和解码 - 希伯来语字符串看起来像乱码 - Teradata 的 Python 3

问题描述

1 个解决方案

解决方案1
0 2021-05-13 12:10:18

Update更新

编码和解码 - 希伯来语字符串看起来像乱码 - Teradata 的 Python 3

问题描述

1 个解决方案

解决方案1 0 2021-05-13 12:10:18

Update更新

解决方案1
0 2021-05-13 12:10:18