简体   繁体   English

如何使用 Selenium 和 Python 从选择每个下拉选项的表中抓取信息?

[英]How to scrape information from tables selecting each of the Dropdown options using Selenium and Python?

Trying to help someone who works for a nonprofit.试图帮助为非营利组织工作的人。 Currently trying to pull info from the STL County Boards/Commissions website( https://boards.stlouisco.com/ ).目前正在尝试从 STL 县委员会/委员会网站( https://boards.stlouisco.com/ )获取信息。

Having trouble for a few reasons:遇到麻烦有几个原因:

Was going to attempt to use BeautifulSoup, but the actual data isn't even shown until you choose a Board/Commission from a dropdown bar above, so I have switched to Selenium, which I am new at.本来打算使用 BeautifulSoup,但是在您从上面的下拉栏中选择 Board/Commission 之前,实际数据甚至不会显示,所以我已经切换到 Selenium,这是我的新手。

Is this task possible?这个任务可以吗? When I look at the html code for the site, I see that the info isn't stored in the page, but pulled from another location and just displayed on the site based on the option chosen from the dropdown menu.当我查看该站点的 html 代码时,我看到该信息未存储在页面中,而是从另一个位置提取并仅根据从下拉菜单中选择的选项显示在站点上。

function ShowMemberList(selectedBoard) {
        ClearMeetingsAndMembers();
        var htmlString = "";
        var boardsList = [{"id":407,"name":"Aging Ahead","isActive":true,"description":"... ...1.","totalSeats":14}];
        var totalMembers = boardsList[$("select[name='BoardsList'] option:selected").index() - 1].totalSeats;
        $.get("/api/boards/" + selectedBoard + "/members", function (data) {
            if (data.length > 0) {
                htmlString += "<table id=\"MemberTable\" class=\"table table-hover\">";
                htmlString += "<thead><th>Member Name</th><th>Title</th><th>Position</th><th>Expiration Date</th></thead><tbody>";
                for (var i = 0; i < totalMembers; i++) {
                    if (i < data.length) {
                        htmlString += "<tr><td>" + FormatString(data[i].firstName) + " " + FormatString(data[i].lastName) + "</td><td>" + FormatString(data[i].title) + "</td><td>" + FormatString(data[i].position) + "</td><td>" + FormatString(data[i].expirationDate) + "</td></tr>";
                    } else {
                        htmlString += "<tr><td colspan=\"4\">---Vacant Seat---</td></tr>" 
                    }
                }
                htmlString += "</tbody></table>";
            } else {
                htmlString = "<span id=\"MemberTable\">There was no data found for this board.</span>";
            }
            $("#Results").append(htmlString);
        });
    }

So far, I have this (not a lot), which goes to the page and selects every board from the list:到目前为止,我有这个(不是很多),它进入页面并从列表中选择每个板:

driver = webdriver.Chrome()
driver.get("https://boards.stlouisco.com/")
select = Select(wait(driver, 10).until(EC.presence_of_element_located((By.ID, 'BoardsList'))))
options = select.options

for board in options:
    select.select_by_visible_text(board.text)

From here I would like to be able to scrape the info from the MemberTable but I don't know how to move forward/if it is something in the scope of my abilities, or even if it is something possible with Selenium.从这里我希望能够从 MemberTable 中抓取信息,但我不知道如何前进/如果它在我的能力 scope 中,或者即使 Selenium 有可能。

I've tried using find_by a few different elements to click on the members table but am met with errors.我尝试使用 find_by 几个不同的元素来单击成员表,但遇到了错误。 I have also tried calling for the memberstable after my select, but it is not able to find that element.我也尝试在我的 select 之后调用成员表,但它无法找到该元素。 Any tips/pointers/advice is appreciated!任何提示/指针/建议表示赞赏!

To choose each of the Board / Commission from the Dropdown and scrape the page you have to induce WebDriverWait for the element_to_be_clickable() and you can use the following Locator Strategies :要从下拉列表中选择每个董事会/委员会并抓取您必须为element_to_be_clickable()诱导WebDriverWait的页面,您可以使用以下定位器策略

Code:代码:

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import Select

options = webdriver.ChromeOptions() 
options.add_argument("start-maximized")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
driver.get("https://boards.stlouisco.com/")
select = Select(WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.ID, 'BoardsList'))))
for option in select.options:
    option.click()
    print("Scrapping :"+option.text)

Console Output:控制台 Output:

Scrapping :---Choose a Board---
Scrapping :Aging Ahead
Scrapping :Aging Ahead Advisory Council
Scrapping :Air Pollution & Noise Control Appeal Board
Scrapping :Animal Care & Control Advisory Board
Scrapping :Bi-State Development Agency (Metro)
Scrapping :Board Of Examiners For Mechanical Licensing
Scrapping :Board of Freeholders
Scrapping :Boundary Commission
Scrapping :Building Code Review Committee
Scrapping :Building Commission & Board Of Building Appeals
Scrapping :Business Advisory Council
Scrapping :Center for Educational Media
Scrapping :Civil Service Commission
Scrapping :Commission On Disabilities
Scrapping :County Health Advisory Board
Scrapping :Domestic And Family Violence Council
Scrapping :East-West Gateway Council of Governments Board of Directors
Scrapping :Economic Development Collaborative Advisory Board
Scrapping :Economic Rescue Team
Scrapping :Electrical Code Review Committee
Scrapping :Electrical Examiners, Board Of
Scrapping :Emergency Communications System Commission
Scrapping :Equalization, Board Of
Scrapping :Fire Standards Commission
Scrapping :Friends of the Kathy J. Weinman Shelter for Battered Women, Inc.
Scrapping :Fund Investment Advisory Committee
Scrapping :Historic Building Commission
Scrapping :Housing Authority
Scrapping :Housing Resources Commission
Scrapping :Human Relations Commission
Scrapping :Industrial Development Authority Board
Scrapping :Justice Services Advisory Board
Scrapping :Lambert Airport Eastern Perimeter Joint Development Commission
Scrapping :Land Clearance For Redevelopment Authority
Scrapping :Lemay Community Improvement District
Scrapping :Library Board
Scrapping :Local Emergency Planning Committee
Scrapping :Mechanical Code Review Committee
Scrapping :Metropolitan Park And Recreation District Board Of Directors (Great Rivers Greenway)
Scrapping :Metropolitan St. Louis Sewer District
Scrapping :Metropolitan Taxicab Commission
Scrapping :Metropolitan Zoological Park and Museum District Board
Scrapping :Municipal Court Judges
Scrapping :Older Adult Commission
Scrapping :Parks And Recreation Advisory Board
Scrapping :Planning Commission
Scrapping :Plumbing Code Review Committee
Scrapping :Plumbing Examiners, Board Of
Scrapping :Police Commissioners, Board Of
Scrapping :Port Authority Board Of Commissioners
Scrapping :Private Security Advisory Committee
Scrapping :Productive Living Board
Scrapping :Public Transportation Commission of St. Louis County
Scrapping :Regional Arts Commission
Scrapping :Regional Convention & Sports Complex Authority
Scrapping :Regional Convention & Visitors Commission
Scrapping :REJIS Commission
Scrapping :Restaurant Commission
Scrapping :Retirement Board Of Trustees
Scrapping :St. Louis Airport Commission
Scrapping :St. Louis County Children's Service Fund Board
Scrapping :St. Louis County Clean Energy Development Board (PACE)
Scrapping :St. Louis County Workforce Development Board
Scrapping :St. Louis Economic Development Partnership
Scrapping :St. Louis Regional Health Commission
Scrapping :St. Louis-Jefferson Solid Waste Management District
Scrapping :Tax Increment Financing Commission of St. Louis County
Scrapping :Transportation Board
Scrapping :Waste Management Commission
Scrapping :World Trade Center - St. Louis
Scrapping :Zoning Adjustment,  Board of
Scrapping :Zoo-Museum District - Art Museum Subdistrict Board of Commissioners
Scrapping :Zoo-Museum District - Botanical Garden Subdistrict Board of Commissioners
Scrapping :Zoo-Museum District - Missouri History Museum Subdistrict Board of Commissioners
Scrapping :Zoo-Museum District - St. Louis Science Center Subdistrict Board of Commissioners
Scrapping :Zoo-Museum District - Zoological Park Subdistrict Board of Commissioners

References参考

You can find a couple of relevant discussions in:您可以在以下位置找到一些相关的讨论:

You can use this script to save all members from all boards to csv:您可以使用此脚本将所有董事会的所有成员保存到 csv:

import json
import requests
import pandas as pd
from bs4 import BeautifulSoup

url = 'https://boards.stlouisco.com/'
members_url = 'https://boards.stlouisco.com/api/boards/{}/members'

soup = BeautifulSoup(requests.get(url).content, 'html.parser')

all_data = []
for o in soup.select('#BoardsList option[value]'):
    print(o['value'], o.text)
    data = requests.get(members_url.format(o['value'])).json()
    for d in data:
        all_data.append(dict(board=o.text, **d))

df = pd.DataFrame(all_data)
print(df)
df.to_csv('data.csv')

Prints:印刷:

                                                 board  boardMemberId  memberId boardName  ...   lastName                                  title                                           position expirationDate
0                                          Aging Ahead          39003     27007      None  ...   Anderson                                   None               ST. LOUIS COUNTY EXECUTIVE APPOINTEE      10/1/2020
1                                          Aging Ahead          38963     27797      None  ...     Bauers                                   None  St. Charles County Community Action Agency App...           None
2                                          Aging Ahead          39004     27815      None  ...  Berkowitz                                   None               ST. LOUIS COUNTY EXECUTIVE APPOINTEE      10/1/2020
3                                          Aging Ahead          38964     27798      None  ...     Biehle                                   None  Jefferson County Community Action Corp. Appointee           None
4                                          Aging Ahead          38581     27597      None  ...     Bowers                                   None               Franklin County Commission Appointee           None
..                                                 ...            ...       ...       ...  ...        ...                                    ...                                                ...            ...
725  Zoo-Museum District - Zoological Park Subdistr...          38863     26745      None  ...       Seat               (Robert R. Hermann, Jr.)                                   St. Louis County     12/31/2019
726  Zoo-Museum District - Zoological Park Subdistr...          38864     26745      None  ...       Seat                        (Winthrop Reed)                                   St. Louis County     12/31/2016
727  Zoo-Museum District - Zoological Park Subdistr...          38669     26745      None  ...       Seat                      (Lawrence Thomas)                                   St. Louis County     12/31/2018
728  Zoo-Museum District - Zoological Park Subdistr...          38670     26745      None  ...       Seat  (Peggy Ritter ) Advisory Commissioner                        Non-Voting St. Louis County     12/31/2019
729  Zoo-Museum District - Zoological Park Subdistr...          38394     27512      None  ...     Wilson                  Advisory Commissioner                       Non-Voting City of St. Louis           None

[730 rows x 9 columns]

And saves data.csv with all boards/members (screenshot from LibreOffice):并保存data.csv与所有董事会/成员(来自 LibreOffice 的屏幕截图):

在此处输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何从使用 python 中的 Selenium 选择下拉列表的每个选项的站点下载多个文件 - How to download multiple files from a site selecting each option of a dropdown using Selenium in python 无法使用 python selenium 使用多个下拉列表中的选项来抓取动态响应表 Web - Unable to Web scrape a dynamic responsive table using python selenium using options from multiple dropdown list 使用 Python + Selenium 从下拉列表中选择值 - Selecting value from dropdown using Python + Selenium 使用 Selenium 和 Python 从下拉列表中选择一个选项 - Selecting an option from a dropdown using Selenium and Python 在python中使用Selenium Web驱动程序从自动完成下拉菜单中选择选项 - Selecting options from Auto Complete Dropdown using selenium web driver in python 使用 Selenium 和 Python 选择选项 - Selecting options using Selenium and Python Selenium 为选项选择下拉列表 - Selenium selecting a dropdown for options Selenium 从下拉 Python 中选择 - Selenium selecting from Dropdown Python 如何在不单击每个元素的情况下获取下拉选项(Selenium 和 Python) - How to get dropdown options without clicking each element (Selenium & Python) 如何通过Selenium Python从JavaScript网页中抓取特定信息? - How to scrape specific information from javascript webpage by Selenium Python?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM