![](/img/trans.png)
[英]encoding error : input conversion failed due to input error, bytes 0x9D 0x29 0x2E 0x20 when using Flask and BeautifulSoup
[英]encoding error : input conversion failed due to input error when using Beautifulsoup
我正在使用線程來提高我的代碼速度。 這是我的主要代碼:
t1 = Thread(target=process_multiple_pages, args=(splited_page_number_list[0]))
t2 = Thread(target=process_multiple_pages, args=(splited_page_number_list[1]))
t1.start()
t2.start()
t1.join()
t2.join()
splited_page_number_list
是一個包含 2 個列表的列表: [[1, 2, 3, 4, .... 57], [58, 59, 60, 61, .... 114]]
process_multiple_pages()
處理列表中的所有頁碼:
def process_multiple_pages(page_number_list):
for i in page_number_list:
process_page(i)
process_page()
獲取傳遞頁碼的頁面並處理它。
def process_page(page_number, subcategory_url):
soup = BeautifulSoup(requests.get("http://www.yes24.com/24/Category/Display/001001046013001?FetchSize=40&PageNumber={}".format(str(page_number))).text, "lxml")
當我運行我的代碼時,它可以工作,但警告encoding error : input conversion failed due to input error, bytes 0xEB 0x85 0x84 0x20
。 有時I/O error : encoder error
。
當我在沒有多線程且只有一個page_number_list
的情況下運行此代碼時,不會出現警告。 為什么會這樣? 可以忽略它嗎?
我無法運行您的代碼,但我看到了一些錯誤。
主要錯誤是: args=
需要帶參數的tuple
,但()
不創建元組。 要創建您需要的元組,
.
args = splited_page_number_list[0], # <-- `,` at the end to create tuple
但是當您在Thread
中創建元組時,您必須使用()
來表明 this ,
用於創建tuple
而不是在Thread()
中分隔其他參數
args = ( splited_page_number_list[0], )
t1 = Thread(target=process_multiple_pages, args=(splited_page_number_list[0],) ) # `,`
t2 = Thread(target=process_multiple_pages, args=(splited_page_number_list[1],) ) # `,`
另一個錯誤(應該會產生錯誤)
你用兩個參數定義函數
def process_page(page_number, subcategory_url):
但后來你用單個值運行它
process_page(i)
這應該會產生錯誤。
坦率地說,我會使用ThreadPool
而不是兩個線程 - 您可以將所有數字作為一個列表發送(不拆分)並在Pool
中設置兩個線程,它將僅使用兩個線程運行。
import requests
from bs4 import BeautifulSoup
from multiprocessing.pool import ThreadPool
# --- functions ---
def process_page(page_number):
url = "http://www.yes24.com/24/Category/Display/001001046013001?FetchSize=40&PageNumber={}".format(page_number)
response = requests.get(url)
print(f'{response.status_code} | {url}')
#soup = BeautifulSoup(response.text, "lxml")
#... rest ...
return f"some result from page {page_number}"
# --- main ---
page_numbers = range(1, 11)
with ThreadPool(2) as p:
results = p.map(process_page, page_numbers)
for item in results:
print(item)
結果
200 | http://www.yes24.com/24/Category/Display/001001046013001?FetchSize=40&PageNumber=3
200 | http://www.yes24.com/24/Category/Display/001001046013001?FetchSize=40&PageNumber=1
200 | http://www.yes24.com/24/Category/Display/001001046013001?FetchSize=40&PageNumber=2
200 | http://www.yes24.com/24/Category/Display/001001046013001?FetchSize=40&PageNumber=4
200 | http://www.yes24.com/24/Category/Display/001001046013001?FetchSize=40&PageNumber=5
200 | http://www.yes24.com/24/Category/Display/001001046013001?FetchSize=40&PageNumber=7
200 | http://www.yes24.com/24/Category/Display/001001046013001?FetchSize=40&PageNumber=8
200 | http://www.yes24.com/24/Category/Display/001001046013001?FetchSize=40&PageNumber=6
200 | http://www.yes24.com/24/Category/Display/001001046013001?FetchSize=40&PageNumber=9
200 | http://www.yes24.com/24/Category/Display/001001046013001?FetchSize=40&PageNumber=10
some result from page 1
some result from page 2
some result from page 3
some result from page 4
some result from page 5
some result from page 6
some result from page 7
some result from page 8
some result from page 9
some result from page 10
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.