简体   繁体   English

如何在python中解码此字符串?

[英]How can I decode this string in python?

I downloaded a dataset of facebook messages and it was formatted like this: 我下载了一个Facebook消息数据集,其格式如下:

f\u00c3\u00b8rste student

It's supposed to be første student but I cant seem to decode it correctly. 本来应该是første student但我似乎无法正确解码。

I tried: 我试过了:

str = 'f\u00c3\u00b8rste student'
print(str)
# 'første student'

str = 'f\u00c3\u00b8rste student'
print(str.encode('utf-8')) 
# b'f\xc3\x83\xc2\xb8rste student'

But it did't work. 但这没有用。

To undo whatever encoding foulup has taken place, you first need to convert the characters to the bytes with the same ordinals by encoding in ISO-8859-1 (Latin-1) and then after that decoding as UTF-8: 要撤消已发生的任何编码欺骗,首先需要通过使用ISO-8859-1(Latin-1)进行编码,然后将字符转换为具有相同序数的字节,然后再将其解码为UTF-8:

>>> 'f\u00c3\u00b8rste student'.encode('iso-8859-1').decode('utf-8')
'første student'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM