从python3中的文件读取字节字符串

Question

The content of a file is like following, and the file encoding is utf-8:文件内容如下，文件编码为utf-8：

cd232704-a46f-3d9d-97f6-67edb897d65f    b'this Friday, Gerda Scheuers will be excited \xe2\x80\x94 but she\xe2\x80\x99s most excited about the merchandise the movie will bring.'

Here is my code:这是我的代码：

with open(file, 'r') as f_in:
    for line in f_in:
        tokens = line.split('\t')
        print(tokens[1])

I want to get the right answer - "this Friday, Gerda Scheuers will be excited - but she's most excited about the merchandise the movie will bring."我想得到正确的答案——“这个星期五，Gerda Scheuers 会很兴奋——但她最兴奋的是这部电影将带来的商品。”

print(b'\xe2\x80\x94'.decode('utf-8')) #convert into ASCII

But I can't read the bytes from a file.但我无法从文件中读取字节。 If I open a file with bytes, I need to decode the line to splite it.如果我用字节打开一个文件，我需要解码该行以拆分它。

Answer 1

You can use ast.literal_eval to convert the bytes literal to bytes:您可以使用ast.literal_eval将字节文字转换为字节：

Then, decode it to get string object:然后，解码它以获取字符串对象：

>>> ast.literal_eval(r"b'excited \xe2\x80\x94 but she\xe2\x80\x99s'")
b'excited \xe2\x80\x94 but she\xe2\x80\x99s'
>>> ast.literal_eval(r"b'excited \xe2\x80\x94 but she\xe2\x80\x99s'").decode('utf-8')
'excited — but she’s'

with open(file, 'r') as f_in:
    for line in f_in:
        tokens = line.split('\t')
        # if len(tokens) < 2:
        #    continue
        bytes_part = ast.literal_eval(tokens[1])
        s = bytes_part.decode('utf-8')  # Decode the bytes to convert to a string

从python3中的文件读取字节字符串

问题描述

1 个解决方案

解决方案1
3 已采纳 2017-04-11 05:46:01

从python3中的文件读取字节字符串

问题描述

1 个解决方案

解决方案1 3 已采纳 2017-04-11 05:46:01

解决方案1
3 已采纳 2017-04-11 05:46:01