如何擺脫python字符串中的b前綴？

Question

我正在導入的一堆推文在閱讀時遇到了這個問題

b'I posted a new photo to Facebook'

我收集b表示它是一個字節。 但這被證明是有問題的，因為在我最終編寫的 CSV 文件中， b不會消失並且會干擾未來的代碼。

有沒有一種簡單的方法可以從我的文本行中刪除這個b前綴？

請記住，我似乎需要將文本編碼為 utf-8 或 tweepy 無法將它們從網絡上拉出來。

這是我正在分析的鏈接內容：

https://www.dropbox.com/s/sjmsbuhrghj7abt/new_tweets.txt?dl=0

new_tweets = 'content in the link'

代碼嘗試

outtweets = [[tweet.text.encode("utf-8").decode("utf-8")] for tweet in new_tweets]
print(outtweets)

錯誤

UnicodeEncodeError                        Traceback (most recent call last)
<ipython-input-21-6019064596bf> in <module>()
      1 for screen_name in user_list:
----> 2     get_all_tweets(screen_name,"instance file")

<ipython-input-19-e473b4771186> in get_all_tweets(screen_name, mode)
     99             with open(os.path.join(save_location,'%s.instance' % screen_name), 'w') as f:
    100                 writer = csv.writer(f)
--> 101                 writer.writerows(outtweets)
    102         else:
    103             with open(os.path.join(save_location,'%s.csv' % screen_name), 'w') as f:

C:\Users\Stan Shunpike\Anaconda3\lib\encodings\cp1252.py in encode(self, input, final)
     17 class IncrementalEncoder(codecs.IncrementalEncoder):
     18     def encode(self, input, final=False):
---> 19         return codecs.charmap_encode(input,self.errors,encoding_table)[0]
     20 
     21 class IncrementalDecoder(codecs.IncrementalDecoder):

UnicodeEncodeError: 'charmap' codec can't encode characters in position 64-65: character maps to <undefined>

Answer 1

decode bytes以產生str ：

b = b'1234'
print(b.decode('utf-8'))  # '1234'

Answer 2

您正在打印的對象不是字符串，而是作為字節文字的bytes對象。

考慮通過鍵入字節文字來創建字節對象（實際上定義字節對象而不實際使用字節對象，例如通過鍵入 b''）並將其轉換為以 utf-8 編碼的字符串對象。 （注意這里的轉換是解碼的意思）

byte_object= b"test" # byte object by literally typing characters
print(byte_object) # Prints b'test'
print(byte_object.decode('utf8')) # Prints "test" without quotations

我們只是應用了.decode(utf8)函數。

字符串文字由以下詞法定義描述：

https://docs.python.org/3.3/reference/lexical_analysis.html#string-and-bytes-literals

stringliteral   ::=  [stringprefix](shortstring | longstring)
stringprefix    ::=  "r" | "u" | "R" | "U"
shortstring     ::=  "'" shortstringitem* "'" | '"' shortstringitem* '"'
longstring      ::=  "'''" longstringitem* "'''" | '"""' longstringitem* '"""'
shortstringitem ::=  shortstringchar | stringescapeseq
longstringitem  ::=  longstringchar | stringescapeseq
shortstringchar ::=  <any source character except "\" or newline or the quote>
longstringchar  ::=  <any source character except "\">
stringescapeseq ::=  "\" <any source character>

bytesliteral   ::=  bytesprefix(shortbytes | longbytes)
bytesprefix    ::=  "b" | "B" | "br" | "Br" | "bR" | "BR" | "rb" | "rB" | "Rb" | "RB"
shortbytes     ::=  "'" shortbytesitem* "'" | '"' shortbytesitem* '"'
longbytes      ::=  "'''" longbytesitem* "'''" | '"""' longbytesitem* '"""'
shortbytesitem ::=  shortbyteschar | bytesescapeseq
longbytesitem  ::=  longbyteschar | bytesescapeseq
shortbyteschar ::=  <any ASCII character except "\" or newline or the quote>
longbyteschar  ::=  <any ASCII character except "\">
bytesescapeseq ::=  "\" <any ASCII character>

Answer 3

您需要對其進行解碼以將其轉換為字符串。 在此處查看有關 python3 中字節文字的答案。

b'I posted a new photo to Facebook'.decode('utf-8')
# 'I posted a new photo to Facebook'

Answer 4

如何刪除b' '字符，它是 python 中的解碼字符串：

import base64
a='cm9vdA=='
b=base64.b64decode(a).decode('utf-8')
print(b)

Answer 5

在帶有 django 2.0 的 python 3.6 上，對字節文字的解碼無法按預期工作。 是的，當我打印它時，我得到了正確的結果，但即使你打印正確， b'value'仍然存在。

這就是我正在編碼的內容

uid': urlsafe_base64_encode(force_bytes(user.pk)),

這就是我正在解碼的內容：

uid = force_text(urlsafe_base64_decode(uidb64))

這就是 django 2.0 所說的：

urlsafe_base64_encode(s)[source]

以 base64 對字節字符串進行編碼以在 URL 中使用，去除任何尾隨等號。

urlsafe_base64_decode(s)[source]

解碼 base64 編碼的字符串，添加任何可能已被剝離的尾隨等號。

這是我的 account_activation_email_test.html 文件

{% autoescape off %}
Hi {{ user.username }},

Please click on the link below to confirm your registration:

http://{{ domain }}{% url 'accounts:activate' uidb64=uid token=token %}
{% endautoescape %}

這是我的控制台響應：

內容類型：文本/純文本； charset="utf-8" MIME-Version：1.0 Content-Transfer-Encoding：7bit 主題：激活您的 MySite 帳戶來自：webmaster@localhost 至：testuser@yahoo.com 日期：2018 年 4 月 20 日星期五 06:26:46 - 0000 消息 ID：<152420560682.16725.4597194169307598579@Dash-U>

嗨測試用戶，

請點擊以下鏈接確認您的注冊：
 http://127.0.0.1:8000/activate/b'MjU'/4vi-fasdtRf2db2989413ba/

如您所見uid = b'MjU'

預期uid = MjU

在控制台中測試：

$ python
Python 3.6.4 (default, Apr  7 2018, 00:45:33) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from django.utils.http import urlsafe_base64_encode, urlsafe_base64_decode
>>> from django.utils.encoding import force_bytes, force_text
>>> var1=urlsafe_base64_encode(force_bytes(3))
>>> print(var1)
b'Mw'
>>> print(var1.decode())
Mw
>>>

經過調查，它似乎與python 3有關。我的解決方法很簡單：

'uid': user.pk,

我在激活功能上將其作為 uidb64 接收：

user = User.objects.get(pk=uidb64)

瞧：

Content-Transfer-Encoding: 7bit
Subject: Activate Your MySite Account
From: webmaster@localhost
To: testuser@yahoo.com
Date: Fri, 20 Apr 2018 20:44:46 -0000
Message-ID: <152425708646.11228.13738465662759110946@Dash-U>


Hi testuser,

Please click on the link below to confirm your registration:

http://127.0.0.1:8000/activate/45/4vi-3895fbb6b74016ad1882/

現在它工作正常。

Answer 6

假設您不想像其他人在這里建議的那樣立即再次對其進行解碼，您可以將其解析為字符串，然后去掉前導'b和尾隨' 。

x = "Hi there 😄" 
x = "Hi there 😄".encode("utf-8") 
x # b"Hi there \xef\xbf\xbd"
str(x)[2:-1]
# "Hi there \\xef\\xbf\\xbd"

Answer 7

我通過僅使用 utf-8 對輸出進行編碼來完成它。 這是代碼示例

new_tweets = api.GetUserTimeline(screen_name = user,count=200)
result = new_tweets[0]
try: text = result.text
except: text = ''

with open(file_name, 'a', encoding='utf-8') as f:
    writer = csv.writer(f)
    writer.writerows(text)

即：從api收集數據時不編碼，只編碼輸出（打印或寫入）。

Answer 8

除了@hiro protagonist 的回答，您還可以通過將字符設置為str來將bytes轉換為string ：

b = b'1234'
str(b,'utf-8') # '1234'

Answer 9

雖然這個問題很老了，但我認為它可能對誰面臨同樣的問題有所幫助。 這里的文本是一個字符串，如下所示：

text= "b'I posted a new photo to Facebook'"

因此，您不能通過編碼來刪除 b，因為它不是一個字節。 我做了以下操作來刪除它。

cleaned_text = text.split("b'")[1]

這將給出"I posted a new photo to Facebook"

如何擺脫python字符串中的b前綴？

問題描述

代碼嘗試

錯誤

9 個解決方案

解決方案1
202 已采納 2017-01-29 08:09:50

解決方案2
25 2017-04-28 12:47:40

字符串文字由以下詞法定義描述：

解決方案3
8 2017-01-29 08:10:39

解決方案4
7 2018-09-05 07:57:27

解決方案5
3 2018-04-20 07:10:15

解決方案6
2 2020-02-21 03:46:03

解決方案7
1 2017-04-26 16:58:15

解決方案8
1 2022-06-01 11:56:46

解決方案9
-2 2018-02-20 07:45:33

如何擺脫python字符串中的b前綴？

問題描述

代碼嘗試

錯誤

9 個解決方案

解決方案1 202 已采納 2017-01-29 08:09:50

解決方案2 25 2017-04-28 12:47:40

字符串文字由以下詞法定義描述：

解決方案3 8 2017-01-29 08:10:39

解決方案4 7 2018-09-05 07:57:27

解決方案5 3 2018-04-20 07:10:15

解決方案6 2 2020-02-21 03:46:03

解決方案7 1 2017-04-26 16:58:15

解決方案8 1 2022-06-01 11:56:46

解決方案9 -2 2018-02-20 07:45:33

解決方案1
202 已采納 2017-01-29 08:09:50

解決方案2
25 2017-04-28 12:47:40

解決方案3
8 2017-01-29 08:10:39

解決方案4
7 2018-09-05 07:57:27

解決方案5
3 2018-04-20 07:10:15

解決方案6
2 2020-02-21 03:46:03

解決方案7
1 2017-04-26 16:58:15

解決方案8
1 2022-06-01 11:56:46

解決方案9
-2 2018-02-20 07:45:33