I am looking to scrape some publicly available data from one of the technology/analyst research firms.
I have gotten so far that I can print out the title and position but the text.strip()
function has not really worked - I am probably missing something obvious.
import requests
from bs4 import BeautifulSoup
from requests.api import head
# get the data
data = requests.get("https://www.forrester.com/bio/michele-goetz?id=BIO5224")
# load data into bs4
soup = BeautifulSoup(data.text, "html.parser")
analyst_data = soup.find("div", { "class": "col-md-9" })
#print(analyst_data)
header_title = analyst_data.find("h1")
header_paragraph = analyst_data.find("p")
print(header_title,header_paragraph)
for data in header_title.find_all(), header_paragraph.find_all():
name = data.find_all("h1")[0].text.strip()
position = data.find_all("p")[1].text.strip()
print(name , position)
You have already found a tag when doing:
header_title = analyst_data.find("h1")
header_paragraph = analyst_data.find("p")
therefore, there no point of creating this for
loop:
for data in header_title.find_all(), header_paragraph.find_all():
name = data.find_all("h1")[0].text.strip()
position = data.find_all("p")[1].text.strip()
print(name , position)
instead, call .text
on header_title
and header_paragraph
. With your example:
import requests
from bs4 import BeautifulSoup
from requests.api import head
# get the data
data = requests.get("https://www.forrester.com/bio/michele-goetz?id=BIO5224")
# load data into bs4
soup = BeautifulSoup(data.text, "html.parser")
analyst_data = soup.find("div", { "class": "col-md-9" })
#print(analyst_data)
header_title = analyst_data.find("h1")
header_paragraph = analyst_data.find("p")
print(header_title.text.strip(), header_paragraph.text.strip())
Output:
Michele Goetz VP, Principal Analyst
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.