简体   繁体   中英

I want to web crawl, but some items are crawled, but some items are not crawled. I do not know the cause

I'm using BeautifulSoup in python, to scrape a website.

While the addrs , a_earths was crawled, points = soup.select('.addr_point') at the end This section can't be crawled. I don't know the cause (the dashed red box in Image of webpage )

Following is code block I'm using:

import urllib.parse
from bs4 import BeautifulSoup
import re

url = 'http://www.dooinauction.com/auction/ca_list.php'

req = urllib.request.Request(url) #
html = urllib.request.urlopen(req).read()
soup = BeautifulSoup(html, 'html.parser') 

tots = soup.select('div.title_left font') #total
tot = int(re.findall('\d+', tots[0].text)[0]) 
print(f'total : {tot}건')

url = f'http://www.dooinauction.com/auction/ca_list.php?total_record={tot}&search_fm_off=1&search_fm_off=1&start=0'
html = urllib.request.urlopen(url).read()
soup = BeautifulSoup(html, 'html.parser')

addrs = soup.select('.addr')  # crawling OK
a_earths = soup.select('.list_class.bold') #crawling OK
points = soup.select('.addr_point') #crawling NO
print()

Image of webpage

I browse your website and it seems that I can't see the addr_points section. I think maybe this is the reason.

Screenshot:

截屏

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM