简体   繁体   中英

Beautiful soup scraping with selenium

I'm learning how to scrape using Beautiful soup with selenium and I found a website that has multiple tables and found table tags (first time dealing with them). I'm learning how to try to scrape those texts from each table and append each element to respected list. First im trying to scrape the first table, and the rest I want to do on my own. But I cannot access the tag for some reason.

I also incorporated selenium to access the sites, because when I copy the link to the site onto another tab, the list of tables disappears, for some reason.

My code so far:

import requests
from bs4 import BeautifulSoup
import pandas as pd
import re
from selenium import webdriver
from selenium.webdriver.support.ui import Select

PATH = "C:\Program Files (x86)\chromedriver.exe"
driver = webdriver.Chrome(PATH)

targetSite =  "https://www.sdvisualarts.net/sdvan_new/events.php"
driver.get(targetSite)

select_event = Select(driver.find_element_by_name('subs'))
select_event.select_by_value('All')

select_loc = Select(driver.find_element_by_name('loc'))
select_loc.select_by_value("All")

driver.find_element_by_name("submit").click()


targetSite   = "https://www.sdvisualarts.net/sdvan_new/viewevents.php"
event_title = []
name = []
address = []
city = []
state = []
zipCode = []
location = []
webSite = []
fee = []
event_dates = []
opening_dates = []
description = []

try:
    page = requests.get(targetSite )
    soup = BeautifulSoup(page.text, 'html.parser')
    items = soup.find_all('table', {"class":"popdetail"})
    for i in items:
        event_title.append(item.find('b', {'class': "text"})).text.strip()
        name.append(item.find('td', {'class': "text"})).text.strip()
        address.append(item.find('td', {'class': "text"})).text.strip()
        city.append(item.find('td', {'class': "text"})).text.strip()
        state.append(item.find('td', {'class': "text"})).text.strip()
        zipCode.append(item.find('td', {'class': "text"})).text.strip()

Can someone let me know if I am doing this correctly, This is my first time dealing with site's urls elements disappear when copied onto a new tab and/or window

So far, I am unable to append any information to each list.

One issue is with the for loop.

you have for i in items: , but then you are calling item instead of i .

And secondly, if you are using selenium to render the page, then you should probably use selenium to get the html. They also have some embedded tables within tables, so it's not as straight forward as iterating through the <table> tags. What I ended up doing was having pandas read in the tables (returns a list of dataframes), then iterating through those as there is a pattern of how the dataframes are constructed.

import pandas as pd
from selenium import webdriver
from selenium.webdriver.support.ui import Select

PATH = "C:\Program Files (x86)\chromedriver.exe"
driver = webdriver.Chrome(PATH)

targetSite =  "https://www.sdvisualarts.net/sdvan_new/events.php"
driver.get(targetSite)

select_event = Select(driver.find_element_by_name('subs'))
select_event.select_by_value('All')

select_loc = Select(driver.find_element_by_name('loc'))
select_loc.select_by_value("All")

driver.find_element_by_name("submit").click()


targetSite   = "https://www.sdvisualarts.net/sdvan_new/viewevents.php"
event_title = []
name = []
address = []
city = []
state = []
zipCode = []
location = []
webSite = []
fee = []
event_dates = []
opening_dates = []
description = []

dfs = pd.read_html(driver.page_source)
driver.close  

for idx, table in enumerate(dfs):
    if table.iloc[0,0] == 'Event Title':
        event_title.append(table.iloc[-1,0])
        tempA = dfs[idx+1]
        tempA.index = tempA[0]
        
        tempB = dfs[idx+4]
        tempB.index = tempB[0]
        
        tempC = dfs[idx+5]
        tempC.index = tempC[0]
        
        name.append(tempA.loc['Name',1])
        address.append(tempA.loc['Address',1])
        city.append(tempA.loc['City',1])
        state.append(tempA.loc['State',1])
        zipCode.append(tempA.loc['Zip',1])
        location.append(tempA.loc['Location',1])
        webSite.append(tempA.loc['Web Site',1])
        
        fee.append(tempB.loc['Fee',1])
        event_dates.append(tempB.loc['Dates',1])
        opening_dates.append(tempB.loc['Opening Days',1])
        
        description.append(tempC.loc['Event Description',1])
        
df = pd.DataFrame({'event_title':event_title,
                    'name':name,
                    'address':address,
                    'city':city,
                    'state':state,
                    'zipCode':zipCode,
                    'location':location,
                    'webSite':webSite,
                    'fee':fee,
                    'event_dates':event_dates,
                    'opening_dates':opening_dates,
                    'description':description})

Output:

print (df.to_string())
                                          event_title                            name                                    address         city       state zipCode             location                                            webSite                                                fee                              event_dates                                      opening_dates                                        description
0   The San Diego Museum of Art Welcomes a Special...         San Diego Museum of Art                 1450 El Prado, Balboa Park    San Diego          CA   92101    Central San Diego                            https://www.sdmart.org/                                                NaN    Starts On 6-18-2020 Ends On 1-10-2021  Opens virtually on June 18. The work will beco...  The San Diego Museum of Art is launching its f...
1                New Exhibit: Miller Dairy Remembered  Lemon Grove Historical Society  3185 Olive Street, Treganza Heritage Park  Lemon Grove          CA   91945    Central San Diego                        http://www.lghistorical.org  Children 12 and under free and must be accompa...    Starts On 6-27-2020 Ends On 12-4-2020  Exhibit on view Saturdays 11 am to 2 pm; close...  From 1926 there were cows smack in the midst o...
2                               Gizmos and Shivelight             Distinction Gallery                           317 E. Grand Ave    Escondido          CA   92025  North County Inland                      http://www.distinctionart.com                                                NaN     Starts On 7-14-2020 Ends On 9-5-2020                                08/08/20 - 09/05/20  Distinction Gallery is proud to present our so...
3                  Virtual Opening - July Exhibitions               Vision Art Museum                   2825 Dewey Rd. Suite 100    San Diego          CA   92106    Central San Diego                    http://www.visionsartmuseum.org                                               Free    Starts On 7-18-2020 Ends On 10-4-2020                                                NaN  Join Visions Art Museum for a virtual exhibiti...
4   Laying it Bare: The Art of Walter Redondo and ...             Fresh Paint Gallery                     1020-B Prospect Street     La Jolla          CA   92037    Central San Diego                      http://freshpaintgallery.com/                                                NaN     Starts On 8-1-2020 Ends On 9-27-2020            Tuesday through Sunday. Mondays closed.  A two-person exhibit of new abstract expressio...
5    Online oil painting lessons with Concetta Antico                             NaN                                        NaN          NaN         NaN     NaN              Virtual  http://concettaantico.com/live-online-oil-pain...                                                NaN    Starts On 8-10-2020 Ends On 8-31-2020                                                NaN  Anyone can learn to paint like the masters! Ov...
6             MOMENTUM: A Creative Industry Symposium                Vanguard Culture                                   Via Zoom    San Diego  California   92101              Virtual  https://www.eventbrite.com/e/momentum-a-creati...                             $10 suggested donation     Starts On 8-17-2020 Ends On 9-7-2020                                                NaN  MOMENTUM: A Creative Industry Symposium Monday...
7                    Virtual Locals Invitational Show        Art & Frames of Coronado                             936 ORANGE AVE     Coronado          CA   92118                    0  https://www.artsteps.com/view/5eed0ad62cd0d65b...                                               free     Starts On 8-21-2020 Ends On 8-1-2021                                                NaN  Art and Frames of Coronado invites you to our ...
8                                          HERE & Now          R.B. Stevenson Gallery              7661 Girard Avenue, Suite 101     La Jolla  California   92037    Central San Diego                  http://www.rbstevensongallery.com                                               Free    Starts On 8-22-2020 Ends On 9-25-2020                           Tuesday through Saturday  R.B.Stevenson Gallery is pleased to announce t...
9                     Art Unites Learning: Normal 2.0                      Art Unites                                        NaN    San Diego         NaN   92116    Central San Diego    https://www.facebook.com/events/956878098104971                                               Free    Starts On 8-25-2020 Ends On 8-25-2020                                                NaN  Please join us on Tuesday, August 25th as we: ...
10  Image Quest Sojourn; Visual Journaling for Per...        Pamela Underwood Studios                                    Virtual          NaN         NaN     NaN              Virtual  http://www.pamelaunderwood.com/event/new-onlin...                                            $595.00   Starts On 8-26-2020 Ends On 11-11-2020                                                NaN  Create a personal Image Quest resource journal...
11  Behind The Exhibition: Southern California Con...         Oceanside Museum of Art                          704 Pier View Way    Oceanside  California   92054              Virtual  https://oma-online.org/events/behind-the-exhib...            No fee required. Donations recommended.    Starts On 8-27-2020 Ends On 8-27-2020                                                NaN  Join curator Beth Smith and exhibitions manage...
12          Lay it on Thick, a Virtual Art Exhibition    San Diego Watercolor Society                    2825 Dewey Rd Bldg #202    San Diego  California   92106                    0                               https://www.sdws.org                                                NaN    Starts On 8-30-2020 Ends On 9-26-2020                                                NaN  The San Diego Watercolor Society proudly prese...
13      The Forum: Marketing & Branding for Creatives                Vanguard Culture                                   Via Zoom    San Diego          CA   92101      South San Diego                        http://vanguardculture.com/                              $5 suggested donation      Starts On 9-1-2020 Ends On 9-1-2020                                                NaN  Attention creative industry professionals! Joi...
14                       Write or Die Solo Exhibition                 You Belong Here                         3619 EL CAJON BLVD    San Diego          CA   92104    Central San Diego  http://www.youbelongsd.com/upcoming-events/wri...            $10 donation to benefit You Belong Here      Starts On 9-4-2020 Ends On 9-6-2020                                                NaN  Write or Die is an immersive installation and ...
15     SDVAN presents Art San Diego at Bread and Salt   San Diego Visual Arts Network                         1955 Julian Avenue     San Digo          CA   92113    Central San Diego  http://www.sdvisualarts.net and https://www.br...                                               Free    Starts On 9-5-2020 Ends On 10-24-2020                                                NaN  We are pleased to announce the four artist rec...
16               The Coming of Treganza Heritage Park  Lemon Grove Historical Society                          3185 Olive Street  Lemon Grove          CA   91945    Central San Diego                        http://www.lghistorical.org                                  Free for all ages    Starts On 9-10-2020 Ends On 9-10-2020  The park is open daily, 8 am to 8 pm. Covid 19...  Lemon Grove\'s central city park will be renam...
17               Online oil painting course | 4 weeks                             NaN                                        NaN          NaN         NaN     NaN              Virtual  http://concettaantico.com/live-online-oil-pain...                                                NaN    Starts On 9-14-2020 Ends On 10-5-2020                                                NaN  Over 4 weekly Zoom lessons, learn the techniqu...
18               Online oil painting course | 4 weeks                             NaN                                        NaN          NaN         NaN     NaN              Virtual  http://concettaantico.com/live-online-oil-pain...                                                NaN   Starts On 10-12-2020 Ends On 11-2-2020                                                NaN  Over 4 weekly Zoom lessons, learn the techniqu...
19                    36th Annual Mission Fed ArtWalk             Mission Fed ArtWalk                                 Ash Street    San Diego  California   92101    Central San Diego                          www.missionfedartwalk.org                                               Free    Starts On 11-7-2020 Ends On 11-8-2020                            Sat and Sun Nov 7 and 8  Mission Fed ArtWalk returns to San Diego’s Lit...
20             Mingei Pop Up Workshop: My Daruma Doll            New Childrens Museum                     200 West Island Avenue    San Diego  California   92101    Central San Diego                        http://thinkplaycreate.org/                                Free with admission  Starts On 11-13-2020 Ends On 11-13-2020                                                NaN  Join Mingei International Museum at The New Ch...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM