[英]How can I encode a string in python into utf8
我遇到了編碼問題。 我認為這與我創建一個字符串以使我的循環工作有關。
from urllib import request
from bs4 import BeautifulSoup
import time
import pprint
f1 = open('urls2.txt','r',encoding="utf8")
ethnicity_urls = f1.readlines()
f1.close()
for each in ethnicity_urls:
time.sleep(1.5)
scraped = request.urlopen(each)
soup = BeautifulSoup(scraped)
soup1 = soup.select('p')
for e in soup1:
soup2 = str(soup1)
soup2 = soup2.replace('\n','')
soup2 = soup2.replace('<p>','')
soup2 = soup2.replace('</p>','')
print(soup2)
resultFile = open('results2.csv','a')
resultFile.write(pprint.pformat(soup2))
resultFile.close()
我得到的錯誤代碼是:
UnicodeEncodeError:“ascii”編解碼器無法在 position 45 中編碼字符“\xa0”:序數不在范圍內(128)
編碼soup2的正確方法是什么? 我試過了
soup2 = soup2.encode('utf-8')
這給了我錯誤
TypeError:需要類似字節的 object,而不是“str”
我是否需要使用 URL 編碼來執行此操作?
任何幫助表示贊賞。
編輯:完整的追溯
Traceback (most recent call last):
File "<ipython-input-35-d2bad914e060>", line 3, in <module>
scraped = request.urlopen(each)
File "D:\Anaconda\lib\urllib\request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "D:\Anaconda\lib\urllib\request.py", line 525, in open
response = self._open(req, data)
File "D:\Anaconda\lib\urllib\request.py", line 543, in _open
'_open', req)
File "D:\Anaconda\lib\urllib\request.py", line 503, in _call_chain
result = func(*args)
File "D:\Anaconda\lib\urllib\request.py", line 1347, in http_open
return self.do_open(http.client.HTTPConnection, req)
File "D:\Anaconda\lib\urllib\request.py", line 1319, in do_open
encode_chunked=req.has_header('Transfer-encoding'))
File "D:\Anaconda\lib\http\client.py", line 1252, in request
self._send_request(method, url, body, headers, encode_chunked)
File "D:\Anaconda\lib\http\client.py", line 1263, in _send_request
self.putrequest(method, url, **skips)
File "D:\Anaconda\lib\http\client.py", line 1118, in putrequest
self._output(self._encode_request(request))
File "D:\Anaconda\lib\http\client.py", line 1198, in _encode_request
return request.encode('ascii')
UnicodeEncodeError: 'ascii' codec can't encode character '\xa0' in position 45: ordinal not in range(128)
謝謝大家的發言。 在手動檢查所有輸入數據后,我可以確定錯誤的原因。 在我的數據轉換中,某些空格未被識別,因此未被 %20 替換。 如果您遇到相同的問題,您可能需要檢查您的輸入文件,如果該代碼適用於一小組經過雙重檢查的輸入數據。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.