I have csv file with this content - as you can see some of field rows are not string values. I read the file using this command:
data = gpd.read_file('data.csv', encoding='utf8')
The CSV file:
Notebook:
As you can see, the column name
is still not decoded. I have tried the following command, but it was not successful, because it sees the column as str
, and can't call decode()
function on it.
data['name'] = data['name'].apply(lambda x:x.decode('utf8', 'strict') if not isinstance(x, str) else x)
It works:
data['name'] = data['name'].apply(
lambda x:x[2:-1].encode().decode("unicode_escape").encode('raw_unicode_escape').decode()
)
In:
x = r"b'\xd9\x85\xd9\x86\xd8\xaa\xd8\xb2\xd9\x87\xd8\xb1\xd8\xa7\xd8\xa8'"
print(f"x {type(x)}\n\t= {x}\n")
x = x[2:-1]
print(f"x[2:-1] {type(x)}\n\t= {x}\n")
x = x.encode()
print(f"x[2:-1].encode() {type(x)}\n\t= {x}\n")
x = x.decode("unicode_escape").encode('raw_unicode_escape')
print(f"x[2:-1].encode().decode('unicode_escape').encode('raw_unicode_escape') {type(x)}\n\t= {x}\n")
x = x.decode()
print(f"x[2:-1].encode().decode('unicode_escape').encode('raw_unicode_escape').decode() {type(x)}\n\t= {x}\n")
Out:
x <class 'str'>
= b'\xd9\x85\xd9\x86\xd8\xaa\xd8\xb2\xd9\x87\xd8\xb1\xd8\xa7\xd8\xa8'
x[2:-1] <class 'str'>
= \xd9\x85\xd9\x86\xd8\xaa\xd8\xb2\xd9\x87\xd8\xb1\xd8\xa7\xd8\xa8
x[2:-1].encode() <class 'bytes'>
= b'\\xd9\\x85\\xd9\\x86\\xd8\\xaa\\xd8\\xb2\\xd9\\x87\\xd8\\xb1\\xd8\\xa7\\xd8\\xa8'
x[2:-1].encode().decode('unicode_escape').encode('raw_unicode_escape') <class 'bytes'>
= b'\xd9\x85\xd9\x86\xd8\xaa\xd8\xb2\xd9\x87\xd8\xb1\xd8\xa7\xd8\xa8'
x[2:-1].encode().decode('unicode_escape').encode('raw_unicode_escape').decode() <class 'str'>
= منتزهراب
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.