简体   繁体   中英

BeautifulSoup object contents to string

I am working to extract table and table header elements from the web page. The table elements have been extracted with no issues. However, I cannot extract the h2 class into individual strings. I can either import all as beautifulsoup objects or as one long string containing all h2 elements. How can I extract the elements to a table or list as individual string objects?

scr = 'https://tv.varsity.com/results/7361971-2022-spirit-unlimited-battle-at-the- 
boardwalk-atlantic-city-grand-ntls/31220'
    
scr1 = requests.get(scr)
soup = BeautifulSoup(scr1.text, "html.parser")
sp3 = soup.find(class_="full-content").find_all("h2")

Here are two methods I have tried so far.

comp = pd.DataFrame(sp3[0], dtype=str)
div1a = div.drop(div.iloc[0].name)
div2a = div1a.drop(div1a.iloc[0].name)

also using a for loop

data = []
for a in soup.find(class_="full-content").find_all("h2"):
    a = str(a.text)
    data.append(a)

x = ",".join(map(str, data))
print(x)

Thank you for the help!

You can use list comprehension to get the text of each h2 element in a list or use a for loop that iterates over the h2 elements.

import requests
from bs4 import BeautifulSoup

url = 'https://tv.varsity.com/results/7361971-2022-spirit-unlimited-battle-at-the-boardwalk-atlantic-city-grand-ntls/31220'
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")
sp3 = soup.find(class_="full-content").find_all("h2")
headers = [elt.text for elt in sp3]
print(headers)

Output:

['2022 Spirit Unlimited: Battle at the Boardwalk Atlantic City Grand Ntls Nationals Results',
'Level 5 & 6 Results', 'L5 Junior', ...
]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM