简体   繁体   English

我怎样才能为获奖者网站抓取一个网站

[英]How can I webscrape a Website for the Winners

Hi I am trying to scrape this website with Python 3 and noticed that in the source code it does not give a clear indication of how I would scrape the names of the winners in these primary elections.嗨,我正在尝试用 Python 3 抓取这个网站,并注意到在源代码中它没有明确说明我将如何抓取这些初选中获胜者的名字。 Can you show me how to scrape a list of all the winners in every MD primary election with this website?你能告诉我如何在这个网站上抓取每次 MD 初选的所有获胜者的名单吗?

https://elections2018.news.baltimoresun.com/results/ https://elections2018.news.baltimoresun.com/results/

The parsing is a little bit complicated, because the results are in many subpages.解析有点复杂,因为结果在很多子页面中。 This scripts collects them and prints result (all data is stored in variable data ):此脚本收集它们并打印结果(所有数据都存储在变量data ):

from bs4 import BeautifulSoup
import requests

url = "https://elections2018.news.baltimoresun.com/results/"
r = requests.get(url)

data = {}
soup = BeautifulSoup(r.text, 'lxml')
for race in soup.select('div[id^=race]'):
    r = requests.get(f"https://elections2018.news.baltimoresun.com/results/contests/{race['id'].split('-')[1]}.html")
    s = BeautifulSoup(r.text, 'lxml')
    l = []
    data[(s.find('h3').text, s.find('div', {'class': 'party-header'}).text)] = l

    for candidate, votes, percent in zip(s.select('td.candidate'), s.select('td.votes'), s.select('td.percent')):
        l.append((candidate.text, votes.text, percent.text))

print('Winners:')
for (race, party), v in data.items():
    print(race, party, v[0])

# print(data)

Outputs:输出:

Winners:
Governor / Lt. Governor Democrat ('Ben Jealous and Susan Turnbull', '227,764', '39.6%')
U.S. Senator Republican ('Tony Campbell', '50,915', '29.2%')
U.S. Senator Democrat ('Ben Cardin', '468,909', '80.4%')
State's Attorney Democrat ('Marilyn J. Mosby', '39,519', '49.4%')
County Executive Democrat ('John "Johnny O" Olszewski, Jr.', '27,270', '32.9%')
County Executive Republican ('Al Redmer, Jr.', '17,772', '55.7%')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM