如何在python中解码此字符串？

Question

I downloaded a dataset of facebook messages and it was formatted like this: 我下载了一个Facebook消息数据集，其格式如下：

f\u00c3\u00b8rste student

It's supposed to be første student but I cant seem to decode it correctly. 本来应该是første student但我似乎无法正确解码。

I tried: 我试过了：

str = 'f\u00c3\u00b8rste student'
print(str)
# 'fÃ¸rste student'

str = 'f\u00c3\u00b8rste student'
print(str.encode('utf-8')) 
# b'f\xc3\x83\xc2\xb8rste student'

But it did't work. 但这没有用。

Answer 1

To undo whatever encoding foulup has taken place, you first need to convert the characters to the bytes with the same ordinals by encoding in ISO-8859-1 (Latin-1) and then after that decoding as UTF-8: 要撤消已发生的任何编码欺骗，首先需要通过使用ISO-8859-1（Latin-1）进行编码，然后将字符转换为具有相同序数的字节，然后再将其解码为UTF-8：

>>> 'f\u00c3\u00b8rste student'.encode('iso-8859-1').decode('utf-8')
'første student'

如何在python中解码此字符串？

问题描述

1 个解决方案

解决方案1
4 已采纳 2018-12-03 22:16:23

如何在python中解码此字符串？

问题描述

1 个解决方案

解决方案1 4 已采纳 2018-12-03 22:16:23

解决方案1
4 已采纳 2018-12-03 22:16:23