[英]How to remove b from the beginning of each line of string read from file?
I'm reading a csv as follows.我正在阅读 csv 如下。
data = pd.read_csv('news.csv')
It contains news
and category
as columns.它包含
news
和category
作为列。 I need to tokenize the words in news column.我需要标记新闻专栏中的单词。 The problem is that each line of text in news column contains
b
at the beginning.问题是news栏目的每一行文字都以
b
开头。
b'Longevity Increase Seen Around the World: WHO'
b'世界各地的寿命增加:世卫组织'
b'Chikungunya spreading, mosquito-borne virus ...b'基孔肯雅热传播,蚊媒病毒......
I tried How do I get rid of the b-prefix in a string in python?我试过如何在 python 中去掉字符串中的 b 前缀? but this is for byte encoded string.
但这是针对字节编码的字符串。 So,
所以,
line = data['news'][0]
line.decode('utf-8')
would cause:会导致:
AttributeError: 'str' object has no attribute 'decode'
Each of those lines are of type str
.这些行中的每一行都是
str
类型。 How do I remove those b's ?我如何删除那些 b ?
This b''
may point to byte type that could be decoded to string ''
, but also could be a string itself with content b'...'
.这个
b''
可能指向可以被解码为字符串''
字节类型,但也可以是内容为b'...'
的字符串本身。
For the first case you need line.decode()
, the second case need line[2:-1]
.对于第一种情况,您需要
line.decode()
,第二种情况需要line[2:-1]
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.