简体   繁体   中英

How can I get Python to navigate to a link and print several data points from this child link?

I am looking at a parent URL, which is this.

https://en.wikipedia.org/wiki/List_of_current_members_of_the_United_States_Senate

From there, I want to get Python to click several links, all of which are ('td')[3].a['href']. The first three in the parent URL are: 'Richard Shelby', 'Doug Jones', and 'Lisa Murkowski' . All children links have text that matches this: 'Assumed office' . I want to grab all these dates of 'Assumed office' . So, for 'Richard Shelby' it would be:

Assumed office
January 3, 1987
Assumed office
April 10, 2018

How can I do that?

For navigating to several different links, I think it will look something like this...

from urllib.parse import urljoin
senator_link = "https://en.wikipedia.org/wiki/List_of_current_members_of_the_United_States_Senate"

senator_link = row.find_all('td')[3].a['href']
senator_link = urljoin(link, senator_link)
response = session.get(senator_link)

with requests.Session() as session:
    html = session.get(link).text
    soup = BeautifulSoup(response.content, "lxml")
    res = soup.findAll("span", {"class": "nowrap"})
    for r in res:
        print("Assumed Office: " + r.find("span", {'class': 'nowrap'}).text)

All I get with that piece of code is this:

AttributeError: 'NoneType' object has no attribute 'text'

You can find the table via the id, and then loop over rows, finding the name and the date of 'Assumed office' :

import requests
from bs4 import BeautifulSoup as soup
d = soup(requests.get('https://en.wikipedia.org/wiki/List_of_current_members_of_the_United_States_Senate').text, 'html.parser')
_, *data = [list(filter(lambda x:x != '\n', [c.text for c in i.find_all('td')])) for i in d.find('table', {'id':'senators'}).find_all('tr')]
final_names = [[(i[1] if len(i) == 7 else i[0]).rstrip(), i[-2].rstrip()] for i in data]

Output:

[['Richard Shelby', 'January 3, 1987'], ['Doug Jones[d]', 'January 3, 2018'], ['Lisa Murkowski', 'December 20, 2002'], ['Dan Sullivan', 'January 3, 2015'], ['John McCain', 'January 3, 1987'], ['Jeff Flake', 'January 3, 2013'], ['John Boozman', 'January 3, 2011'], ['Tom Cotton', 'January 3, 2015'], ['Dianne Feinstein', 'November 10, 1992'], ['Kamala Harris', 'January 3, 2017'], ['Michael Bennet', 'January 22, 2009'], ['Cory Gardner', 'January 3, 2015'], ['Richard Blumenthal', 'January 3, 2011'], ['Chris Murphy', 'January 3, 2013'], ['Tom Carper', 'January 3, 2001'], ['Chris Coons', 'November 15, 2010'], ['Bill Nelson', 'January 3, 2001'], ['Marco Rubio', 'January 3, 2011'], ['Johnny Isakson', 'January 3, 2005'], ['David Perdue', 'January 3, 2015'], ['Brian Schatz', 'December 26, 2012'], ['Mazie Hirono', 'January 3, 2013'], ['Mike Crapo', 'January 3, 1999'], ['Jim Risch', 'January 3, 2009'], ['Dick Durbin', 'January 3, 1997'], ['Tammy Duckworth', 'January 3, 2017'], ['Joe Donnelly', 'January 3, 2013'], ['Todd Young', 'January 3, 2017'], ['Chuck Grassley', 'January 3, 1981'], ['Joni Ernst', 'January 3, 2015'], ['Pat Roberts', 'January 3, 1997'], ['Jerry Moran', 'January 3, 2011'], ['Mitch McConnell', 'January 3, 1985'], ['Rand Paul', 'January 3, 2011'], ['Bill Cassidy', 'January 3, 2015'], ['John Kennedy', 'January 3, 2017'], ['Susan Collins', 'January 3, 1997'], ['Angus King', 'January 3, 2013'], ['Ben Cardin', 'January 3, 2007'], ['Chris Van Hollen', 'January 3, 2017'], ['Elizabeth Warren', 'January 3, 2013'], ['Ed Markey', 'July 16, 2013'], ['Debbie Stabenow', 'January 3, 2001'], ['Gary Peters', 'January 3, 2015'], ['Amy Klobuchar', 'January 3, 2007'], ['Tina Smith[e]', 'January 3, 2018'], ['Roger Wicker', 'December 31, 2007'], ['Cindy Hyde-Smith[f]', 'April 9, 2018'], ['Claire McCaskill', 'January 3, 2007'], ['Roy Blunt', 'January 3, 2011'], ['Jon Tester', 'January 3, 2007'], ['Steve Daines', 'January 3, 2015'], ['Deb Fischer', 'January 3, 2013'], ['Ben Sasse', 'January 3, 2015'], ['Dean Heller', 'May 9, 2011'], ['Catherine Cortez Masto', 'January 3, 2017'], ['Jeanne Shaheen', 'January 3, 2009'], ['Maggie Hassan', 'January 3, 2017'], ['Bob Menendez', 'January 18, 2006'], ['Cory Booker', 'October 31, 2013'], ['Tom Udall', 'January 3, 2009'], ['Martin Heinrich', 'January 3, 2013'], ['Chuck Schumer', 'January 3, 1999'], ['Kirsten Gillibrand', 'January 26, 2009'], ['Richard Burr', 'January 3, 2005'], ['Thom Tillis', 'January 3, 2015'], ['John Hoeven', 'January 3, 2011'], ['Heidi Heitkamp', 'January 3, 2013'], ['Sherrod Brown', 'January 3, 2007'], ['Rob Portman', 'January 3, 2011'], ['Jim Inhofe', 'November 17, 1994'], ['James Lankford', 'January 3, 2015'], ['Ron Wyden', 'February 6, 1996'], ['Jeff Merkley', 'January 3, 2009'], ['Bob Casey Jr.', 'January 3, 2007'], ['Pat Toomey', 'January 3, 2011'], ['Jack Reed', 'January 3, 1997'], ['Sheldon Whitehouse', 'January 3, 2007'], ['Lindsey Graham', 'January 3, 2003'], ['Tim Scott', 'January 2, 2013'], ['John Thune', 'January 3, 2005'], ['Mike Rounds', 'January 3, 2015'], ['Lamar Alexander', 'January 3, 2003'], ['Bob Corker', 'January 3, 2007'], ['John Cornyn', 'December 1, 2002'], ['Ted Cruz', 'January 3, 2013'], ['Orrin Hatch', 'January 3, 1977'], ['Mike Lee', 'January 3, 2011'], ['Patrick Leahy', 'January 3, 1975'], ['Bernie Sanders', 'January 3, 2007'], ['Mark Warner', 'January 3, 2009'], ['Tim Kaine', 'January 3, 2013'], ['Patty Murray', 'January 3, 1993'], ['Maria Cantwell', 'January 3, 2001'], ['Joe Manchin', 'November 15, 2010'], ['Shelley Moore Capito', 'January 3, 2015'], ['Ron Johnson', 'January 3, 2011'], ['Tammy Baldwin', 'January 3, 2013'], ['Mike Enzi', 'January 3, 1997'], ['John Barrasso', 'June 25, 2007']]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM