[英]I want to web crawl, but some items are crawled, but some items are not crawled. I do not know the cause
我在 python 中使用 BeautifulSoup 來抓取網站。
addrs
, a_earths
被爬取時,末尾points = soup.select('.addr_point')
This section 無法被爬取。 我不知道原因(網頁圖像中的紅色虛線框)
以下是我正在使用的代碼塊:
import urllib.parse
from bs4 import BeautifulSoup
import re
url = 'http://www.dooinauction.com/auction/ca_list.php'
req = urllib.request.Request(url) #
html = urllib.request.urlopen(req).read()
soup = BeautifulSoup(html, 'html.parser')
tots = soup.select('div.title_left font') #total
tot = int(re.findall('\d+', tots[0].text)[0])
print(f'total : {tot}건')
url = f'http://www.dooinauction.com/auction/ca_list.php?total_record={tot}&search_fm_off=1&search_fm_off=1&start=0'
html = urllib.request.urlopen(url).read()
soup = BeautifulSoup(html, 'html.parser')
addrs = soup.select('.addr') # crawling OK
a_earths = soup.select('.list_class.bold') #crawling OK
points = soup.select('.addr_point') #crawling NO
print()
我瀏覽了您的網站,但似乎看不到 addr_points 部分。 我想也許這就是原因。
截屏:
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.