简体   繁体   中英

Scraping data with BS4 - text strip() not working

I am looking to scrape some publicly available data from one of the technology/analyst research firms.

I have gotten so far that I can print out the title and position but the text.strip() function has not really worked - I am probably missing something obvious.

import requests
from bs4 import BeautifulSoup
from requests.api import head

# get the data
data = requests.get("https://www.forrester.com/bio/michele-goetz?id=BIO5224")

# load data into bs4
soup = BeautifulSoup(data.text, "html.parser")

analyst_data = soup.find("div", { "class": "col-md-9" })
#print(analyst_data)
header_title = analyst_data.find("h1")
header_paragraph = analyst_data.find("p")
print(header_title,header_paragraph)

for data in header_title.find_all(), header_paragraph.find_all():
    name = data.find_all("h1")[0].text.strip()
    position = data.find_all("p")[1].text.strip()
    print(name , position)

You have already found a tag when doing:

header_title = analyst_data.find("h1")
header_paragraph = analyst_data.find("p")

therefore, there no point of creating this for loop:

for data in header_title.find_all(), header_paragraph.find_all():
    name = data.find_all("h1")[0].text.strip()
    position = data.find_all("p")[1].text.strip()
    print(name , position)

instead, call .text on header_title and header_paragraph . With your example:

import requests
from bs4 import BeautifulSoup
from requests.api import head

# get the data
data = requests.get("https://www.forrester.com/bio/michele-goetz?id=BIO5224")

# load data into bs4
soup = BeautifulSoup(data.text, "html.parser")

analyst_data = soup.find("div", { "class": "col-md-9" })
#print(analyst_data)
header_title = analyst_data.find("h1")
header_paragraph = analyst_data.find("p")
print(header_title.text.strip(), header_paragraph.text.strip())

Output:

Michele Goetz VP, Principal Analyst

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM