Web刮取程序循環不返回任何內容

Question

我開發了這個簡單的Web抓取程序來抓取newegg.com。 我做了一個for循環，以打印出產品名稱，價格和運輸成本。

但是，當我運行for循環時，它不會輸出任何內容，也不會給我任何錯誤。 在編寫for循環（帶注釋的項目）之前，我已經運行了這些行（帶注釋的項目），並且只打印其中一種產品的詳細信息。

from bs4 import BeautifulSoup
import requests
import csv

source = requests.get('https://www.newegg.com/PS4-Systems/SubCategory/ID-3102').text

soup = BeautifulSoup(source, 'lxml')

#prod = soup.find('a', class_='item-title').text
#price = soup.find('li', class_='price-current').text.strip()
#ship = soup.find('li', class_='price-ship').text.strip()
#print(prod.strip())
#print(price.strip())
#print(ship)

for info in soup.find_all('div', class_='item-container  '):
    prod = soup.find('a', class_='item-title').text
    price = soup.find('li', class_='price-current').text.strip()
    ship = soup.find('li', class_='price-ship').text.strip()
    print(prod.strip())
    #price.splitlines()[3].replace('\xa0', '')
    print(price.strip())
    print(ship)

Answer 1

除了“空格”錯字和縮進，您實際上並沒有在for循環中使用info 。 這將繼續打印第一項。 在您要soup for循環中使用info 。

from bs4 import BeautifulSoup
import requests
import csv

source = requests.get('https://www.newegg.com/PS4-Systems/SubCategory/ID-3102').text

soup = BeautifulSoup(source, 'lxml')

for info in soup.find_all('div', class_='item-container'):
    prod = info.find('a', class_='item-title').text.strip()
    price = info.find('li', class_='price-current').text.strip().splitlines()[1].replace(u'\xa0', '')
    if  u'$' not in price:
        price = info.find('li', class_='price-current').text.strip().splitlines()[0].replace(u'\xa0', '')
    ship = info.find('li', class_='price-ship').text.strip()
    print(prod)
    print(price)
    print(ship)

因為您的代碼未在下面的代碼中使用info for info in soup.....:中for info in soup.....:而是for info in soup.....: soup.find(..) ，所以它將僅查找例如soup.find('a', class_='item-title') 。 如果使用info.find(....) ，它將在for循環的每個循環中使用下一個<div>元素。

編輯：我還發現，當您使用.splitlines() ，價格並不總是第二個項目，有時它是第一個。 為此，我添加了一項檢查以查看該項目是否包含“ $”符號。 如果不是，則使用第一個列表項。

Answer 2

編寫更少的代碼：

from bs4 import BeautifulSoup
import requests

source = requests.get('https://www.newegg.com/PS4-Systems/SubCategory/ID-3102').text    
soup = BeautifulSoup(source, 'lxml')

for info in soup.find_all('div', class_='item-container '):
    print(info.find('a', class_='item-title').text)
    print(info.find('li', class_='price-current').text.strip())        
    print(info.find('li', class_='price-ship').text.strip())

Answer 3

@Rick您錯誤地for info in soup.find_all('div', class_='item-container '):添加了額外的空間以for info in soup.find_all('div', class_='item-container '):在代碼下面的屬性值檢查之后的這一行將按您的預期工作

from bs4 import BeautifulSoup
import requests
import csv

source = requests.get('https://www.newegg.com/PS4-Systems/SubCategory/ID-3102').text

soup = BeautifulSoup(source, 'lxml')

for info in soup.find_all('div', class_='item-container '):
    prod = soup.find('a', class_='item-title').text
    price = soup.find('li', class_='price-current').text.strip()
    ship = soup.find('li', class_='price-ship').text.strip()
    print(prod.strip())
    print(price.strip())
    print(ship)

希望這能解決您的問題...

Web刮取程序循環不返回任何內容

問題描述

3 個解決方案

解決方案1
2 2019-01-04 22:04:45

解決方案2
2 2019-01-04 22:06:32

解決方案3
-2 已采納 2019-01-04 21:59:47

Web刮取程序循環不返回任何內容

問題描述

3 個解決方案

解決方案1 2 2019-01-04 22:04:45

解決方案2 2 2019-01-04 22:06:32

解決方案3 -2 已采納 2019-01-04 21:59:47

解決方案1
2 2019-01-04 22:04:45

解決方案2
2 2019-01-04 22:06:32

解決方案3
-2 已采納 2019-01-04 21:59:47