簡體   English   中英

如何使用 BeautifulSoup 抓取網站

[英]How Can I Scrape a Website with BeautifulSoup

我試圖從網站上抓取一個列表,但我想單獨拉取的每個經銷商都沒有標簽。 有什么辦法可以拉動它們,以便它們單獨拉動而不是作為列表拉動?

這是我試圖從中提取的網站:

http://www.autodealerdirectory.us/ca_s_madd.html

import requests
from bs4 import BeautifulSoup

url = 'http://www.autodealerdirectory.us/ca_s_madd.html'

r = requests.get(url)

soup = BeautifulSoup(r.text, 'lxml')

dealers = []

for tag in soup.select('#bodyText hr')[1:]:
    s = ''
    s += tag.next_sibling
    s += tag.next_sibling.next_sibling.next_sibling
    s += tag.next_sibling.next_sibling.next_sibling.next_sibling.next_sibling
    s += tag.next_sibling.next_sibling.next_sibling.next_sibling.next_sibling.next_sibling.next_sibling
    dealers.append(s)

for dealer in dealers:
    print(dealer.strip())
    print('-----------------------------------------')

這將完成工作。 每個經銷商的信息是在列表中dealers 你只需要清理字符串

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM