繁体   English   中英

如何将 python 中的字符串编码为 utf8

[英]How can I encode a string in python into utf8

我遇到了编码问题。 我认为这与我创建一个字符串以使我的循环工作有关。

from urllib import request
from bs4 import BeautifulSoup
import time
import pprint

f1 = open('urls2.txt','r',encoding="utf8")
ethnicity_urls = f1.readlines()
f1.close()

for each in ethnicity_urls:
    time.sleep(1.5)
    scraped = request.urlopen(each)
    soup = BeautifulSoup(scraped)
    soup1 = soup.select('p')
    for e in soup1:
        soup2 = str(soup1)
        soup2 = soup2.replace('\n','')
        soup2 = soup2.replace('<p>','')
        soup2 = soup2.replace('</p>','')
        print(soup2)
    resultFile = open('results2.csv','a')
    resultFile.write(pprint.pformat(soup2))
    resultFile.close()

我得到的错误代码是:

UnicodeEncodeError:“ascii”编解码器无法在 position 45 中编码字符“\xa0”:序数不在范围内(128)

编码soup2的正确方法是什么? 我试过了

soup2 = soup2.encode('utf-8')

这给了我错误

TypeError:需要类似字节的 object,而不是“str”

我是否需要使用 URL 编码来执行此操作?

任何帮助表示赞赏。

编辑:完整的追溯

Traceback (most recent call last):

  File "<ipython-input-35-d2bad914e060>", line 3, in <module>
    scraped = request.urlopen(each)

  File "D:\Anaconda\lib\urllib\request.py", line 222, in urlopen
    return opener.open(url, data, timeout)

  File "D:\Anaconda\lib\urllib\request.py", line 525, in open
    response = self._open(req, data)

  File "D:\Anaconda\lib\urllib\request.py", line 543, in _open
    '_open', req)

  File "D:\Anaconda\lib\urllib\request.py", line 503, in _call_chain
    result = func(*args)

  File "D:\Anaconda\lib\urllib\request.py", line 1347, in http_open
    return self.do_open(http.client.HTTPConnection, req)

  File "D:\Anaconda\lib\urllib\request.py", line 1319, in do_open
    encode_chunked=req.has_header('Transfer-encoding'))

  File "D:\Anaconda\lib\http\client.py", line 1252, in request
    self._send_request(method, url, body, headers, encode_chunked)

  File "D:\Anaconda\lib\http\client.py", line 1263, in _send_request
    self.putrequest(method, url, **skips)

  File "D:\Anaconda\lib\http\client.py", line 1118, in putrequest
    self._output(self._encode_request(request))

  File "D:\Anaconda\lib\http\client.py", line 1198, in _encode_request
    return request.encode('ascii')

UnicodeEncodeError: 'ascii' codec can't encode character '\xa0' in position 45: ordinal not in range(128)

谢谢大家的发言。 在手动检查所有输入数据后,我可以确定错误的原因。 在我的数据转换中,某些空格未被识别,因此未被 %20 替换。 如果您遇到相同的问题,您可能需要检查您的输入文件,如果该代码适用于一小组经过双重检查的输入数据。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM