簡體   English   中英

encoding error : 使用 Beautifulsoup 時由於輸入錯誤導致輸入轉換失敗

[英]encoding error : input conversion failed due to input error when using Beautifulsoup

我正在使用線程來提高我的代碼速度。 這是我的主要代碼:

t1 = Thread(target=process_multiple_pages, args=(splited_page_number_list[0]))
t2 = Thread(target=process_multiple_pages, args=(splited_page_number_list[1]))
t1.start()
t2.start()
t1.join()
t2.join()

splited_page_number_list是一個包含 2 個列表的列表: [[1, 2, 3, 4, .... 57], [58, 59, 60, 61, .... 114]]

process_multiple_pages()處理列表中的所有頁碼:

def process_multiple_pages(page_number_list):
    for i in page_number_list:
        process_page(i)

process_page()獲取傳遞頁碼的頁面並處理它。

def process_page(page_number, subcategory_url):
    soup = BeautifulSoup(requests.get("http://www.yes24.com/24/Category/Display/001001046013001?FetchSize=40&PageNumber={}".format(str(page_number))).text, "lxml")

當我運行我的代碼時,它可以工作,但警告encoding error : input conversion failed due to input error, bytes 0xEB 0x85 0x84 0x20 有時I/O error : encoder error

當我在沒有多線程且只有一個page_number_list的情況下運行此代碼時,不會出現警告。 為什么會這樣? 可以忽略它嗎?

我無法運行您的代碼,但我看到了一些錯誤。

主要錯誤是: args=需要帶參數的tuple ,但()不創建元組。 要創建您需要的元組, .

args = splited_page_number_list[0],  # <-- `,` at the end to create tuple

但是當您在Thread中創建元組時,您必須使用()來表明 this ,用於創建tuple而不是在Thread()中分隔其他參數

args = ( splited_page_number_list[0], )

t1 = Thread(target=process_multiple_pages, args=(splited_page_number_list[0],) )  # `,`
t2 = Thread(target=process_multiple_pages, args=(splited_page_number_list[1],) )  # `,`

另一個錯誤(應該會產生錯誤)

你用兩個參數定義函數

def process_page(page_number, subcategory_url):

但后來你用單個值運行它

process_page(i)

這應該會產生錯誤。


坦率地說,我會使用ThreadPool而不是兩個線程 - 您可以將所有數字作為一個列表發送(不拆分)並在Pool中設置兩個線程,它將僅使用兩個線程運行。

import requests
from bs4 import BeautifulSoup
from multiprocessing.pool import ThreadPool

# --- functions ---

def process_page(page_number):
    url = "http://www.yes24.com/24/Category/Display/001001046013001?FetchSize=40&PageNumber={}".format(page_number)
    response = requests.get(url)
    print(f'{response.status_code} | {url}')
    
    #soup = BeautifulSoup(response.text, "lxml")
    #... rest ...

    return f"some result from page {page_number}"

# --- main ---

page_numbers = range(1, 11)

with ThreadPool(2) as p:
    results = p.map(process_page, page_numbers)
    
    for item in results:
        print(item)

結果

200 | http://www.yes24.com/24/Category/Display/001001046013001?FetchSize=40&PageNumber=3
200 | http://www.yes24.com/24/Category/Display/001001046013001?FetchSize=40&PageNumber=1
200 | http://www.yes24.com/24/Category/Display/001001046013001?FetchSize=40&PageNumber=2
200 | http://www.yes24.com/24/Category/Display/001001046013001?FetchSize=40&PageNumber=4
200 | http://www.yes24.com/24/Category/Display/001001046013001?FetchSize=40&PageNumber=5
200 | http://www.yes24.com/24/Category/Display/001001046013001?FetchSize=40&PageNumber=7
200 | http://www.yes24.com/24/Category/Display/001001046013001?FetchSize=40&PageNumber=8
200 | http://www.yes24.com/24/Category/Display/001001046013001?FetchSize=40&PageNumber=6
200 | http://www.yes24.com/24/Category/Display/001001046013001?FetchSize=40&PageNumber=9
200 | http://www.yes24.com/24/Category/Display/001001046013001?FetchSize=40&PageNumber=10
some result from page 1
some result from page 2
some result from page 3
some result from page 4
some result from page 5
some result from page 6
some result from page 7
some result from page 8
some result from page 9
some result from page 10

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM