如何从文件中读取的每行字符串的开头删除 b？

Question

I'm reading a csv as follows.我正在阅读 csv 如下。

data = pd.read_csv('news.csv')

It contains news and category as columns.它包含news和category作为列。 I need to tokenize the words in news column.我需要标记新闻专栏中的单词。 The problem is that each line of text in news column contains b at the beginning.问题是news栏目的每一行文字都以b开头。

b'Longevity Increase Seen Around the World: WHO' b'世界各地的寿命增加：世卫组织'
b'Chikungunya spreading, mosquito-borne virus ... b'基孔肯雅热传播，蚊媒病毒......

I tried How do I get rid of the b-prefix in a string in python?我试过如何在 python 中去掉字符串中的 b 前缀？ but this is for byte encoded string.但这是针对字节编码的字符串。 So,所以，

line = data['news'][0]
line.decode('utf-8')

would cause:会导致：

AttributeError: 'str' object has no attribute 'decode'

Each of those lines are of type str .这些行中的每一行都是str类型。 How do I remove those b's ?我如何删除那些 b ？

Answer 1

This b'' may point to byte type that could be decoded to string '' , but also could be a string itself with content b'...' .这个b''可能指向可以被解码为字符串''字节类型，但也可以是内容为b'...'的字符串本身。

For the first case you need line.decode() , the second case need line[2:-1] .对于第一种情况，您需要line.decode() ，第二种情况需要line[2:-1] 。

如何从文件中读取的每行字符串的开头删除 b？

问题描述

1 个解决方案

解决方案1
1 2020-10-16 12:13:40

如何从文件中读取的每行字符串的开头删除 b？

问题描述

1 个解决方案

解决方案1 1 2020-10-16 12:13:40

解决方案1
1 2020-10-16 12:13:40