简体   繁体   中英

Web scraping in Python using Selenium

I am new to web scraping and i am facing a problem. In the appending part, it seems to append only the first row of the table I want to scrape! I am sure I am missing something. Any ideas? Thanks in advance! The code snippet is the following:

driver = visit_main_page()

contents = driver.find_elements_by_xpath('//*[@id="mw-content-text"]/div[1]')

tables = contents[0].find_elements_by_xpath('//*[@id="mw-content-text"]/div[1]/table')

data = {"Date": [], "Time": [], "Place": [], "Latitude": [], "Longitude": [], "Fatalities": [], "Magnitude": []}

for i in tables:

    try:
        dates = driver.find_elements_by_xpath('//*[@id="mw-content-text"]/div[1]/table[2]/tbody/tr[1]/td[1]')
        times = driver.find_elements_by_xpath('//*[@id="mw-content-text"]/div[1]/table[2]/tbody/tr[1]/td[2]')
        places = driver.find_elements_by_xpath('//*[@id="mw-content-text"]/div[1]/table[2]/tbody/tr[1]/td[3]')
        lat = driver.find_elements_by_xpath('//*[@id="mw-content-text"]/div[1]/table[2]/tbody/tr[1]/td[4]')
        long = driver.find_elements_by_xpath('//*[@id="mw-content-text"]/div[1]/table[2]/tbody/tr[1]/td[5]')
        fat = driver.find_elements_by_xpath('//*[@id="mw-content-text"]/div[1]/table[2]/tbody/tr[1]/td[6]')
        magn = driver.find_elements_by_xpath('//*[@id="mw-content-text"]/div[1]/table[2]/tbody/tr[1]/td[7]')
    except NoSuchElementException:
        print('No such content!')
        pass
    time.sleep(1)

    for d in dates:
        data['Date'].append(d.text)

    for t in times:
        data['Time'].append(t.text)

    for p in places:
        data['Place'].append(p.text)

    for la in lat:
        data['Latitude'].append(la.text)

    for lo in long:
        data['Longitude'].append(lo.text)

    for f in fat:
        data['Fatalities'].append(f.text)

    for m in magn:
        data['Magnitude'].append(m.text)

UPD
You are using a wrong locators.
All the parameters you are trying to grab are starting with //*[@id="mw-content-text"]/div[1]/table[2] - this points to a specific table.
To collect the data you are looking for try this:

dates = driver.find_elements_by_xpath("//table[contains(@class,'wikitable')]//tbody//tr//td[1]")
times = driver.find_elements_by_xpath("//table[contains(@class,'wikitable')]//tbody//tr//td[2]")
places = driver.find_elements_by_xpath("//table[contains(@class,'wikitable')]//tbody//tr//td[3]")
lat = driver.find_elements_by_xpath("//table[contains(@class,'wikitable')]//tbody//tr//td[4]")
long = driver.find_elements_by_xpath("//table[contains(@class,'wikitable')]//tbody//tr//td[5]")
fat = driver.find_elements_by_xpath("//table[contains(@class,'wikitable')]//tbody//tr//td[6]")
magn = driver.find_elements_by_xpath("//table[contains(@class,'wikitable')]//tbody//tr//td[7]")


dates = driver.find_elements_by_xpath("//table[contains(@class,'wikitable')]//tbody//tr//td[1]")

This is the main problem. The code after that looks correct.
You have no to get contents and tables with this approach

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM