[英]Simple Python Web-scraper with Beautiful Soup
我正在尝试抓取汽车数据。 他们的“id”标签增加 1,但是我似乎无法弄清楚如何做到这一点。 这是我所拥有的:
import bs4 as bs
import urllib
source = urllib.request.urlopen('http://www.25thstauto.com/inventory.aspx?cursort=asc&pagesize=500').read()
soup = bs.BeautifulSoup(source, 'lxml')
#finds the total number of cars
count = soup.find('span', {'id': 'ctl00_cphBody_inv1_lblVehicleCount'}).getText()[:2]
count = int(count)
i = 1
for url in range(1,count):
url = soup.find_all('a', {'id': 'ctl00_cphBody_inv1_rptInventoryNew_ctl0'+i+'_nlVehicleDetailsTitle'})
print(url['href'])
i = i + 1
import bs4 as bs
import urllib
import re
source = urllib.request.urlopen('http://www.25thstauto.com/inventory.aspx?cursort=asc&pagesize=500').read()
soup = bs.BeautifulSoup(source, 'lxml')
for a in soup.find_all('a', id=re.compile('ctl00_cphBody_inv1_rptInventoryNew')):
print(a.get('href'))
出去:
2008_Chevrolet_Malibu_Easton_PA_265928462.veh
2008_Chevrolet_Malibu_Easton_PA_265928462.veh
2008_Chevrolet_Malibu_Easton_PA_265928462.veh
2002_Nissan_Xterra_Easton_PA_266894015.veh
2002_Nissan_Xterra_Easton_PA_266894015.veh
2002_Nissan_Xterra_Easton_PA_266894015.veh
2009_Chevrolet_Cobalt_Easton_PA_265621796.veh
2009_Chevrolet_Cobalt_Easton_PA_265621796.veh
使用正则表达式查找id
属性包含ctl00_cphBody_inv1_rptInventoryNew
a
标签
或使用CSS 选择器:
for a in soup.select('a[id*=ctl00_cphBody_inv1_rptInventoryNew]'):
print(a.get('href'))
这个想法是一样的。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.