I'm trying to scrape news data where I want all the paragraphs of the news article. So I used Soup.find_all('p')
to scrape all the paragraphs but it contains HTML tags and since Soup.find_all('p')
will return bs4.element.ResultSet
datatype I can't use other methods like .get_text()
or .decompose()
or .stripe()
And I can't use Soup.find('p')
as it will give the first paragraph only and I need all the paragraphs.
Here is my code:
for story in J:
page3 = requests.get(story)
SOUP = BeautifulSoup(page3.content, 'html.parser')
q = SOUP.find_all('p')
print(q[0])
Output: Output
Simply iterate over your ResultSet
to get the stripped text and join()
the single texts by whitespace:
' '.join([p.get_text(strip=True) for p in SOUP.find_all('p')])
for story in J:
page3 = requests.get(story)
SOUP = BeautifulSoup(page3.content, 'html.parser')
t = ' '.join([p.get_text(strip=True) for p in SOUP.find_all('p')])
print(t)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.