How to scrape headline news, link and image?

Question

I'd like to scrape news headline, link of news and picture of that news.

I try to use web scraping following as below. but It's only headline code and It is not work.

import requests
import pandas as pd
from bs4 import BeautifulSoup

nbc_business = "https://news.mongabay.com/list/environment"
res = requests.get(nbc_business, verify=False)
soup = BeautifulSoup(res.content, 'html.parser')

headlines = soup.find_all('h2',{'class':'post-title-news'})
len(headlines)
for i in range(len(headlines)):
    print(headlines[i].text)

Please recommend it to me.

Answer 1

This is because the site blocks bot. If you print the res.content which shows 403.

Add headers={'User-Agent':'Mozilla/5.0'} to the request.

Try the code below,

nbc_business = "https://news.mongabay.com/list/environment"
res = requests.get(nbc_business, verify=False, headers={'User-Agent':'Mozilla/5.0'})

soup = BeautifulSoup(res.content, 'html.parser')

headlines = soup.find_all('h2', class_='post-title-news')
print(len(headlines))
for i in range(len(headlines)):
   print(headlines[i].text)

Answer 2

First things first: never post code as an image .

<h2> in your HTML has no text . What it does have, is an <a> element, so:

 for hl in headlines:
     link = hl.findChild()
     text = link.text
     url = link.attrs['href']

How to scrape headline news, link and image?

Question

2 answers

solution1
1 ACCPTED 2022-11-16 09:23:13

solution2
0 2022-11-16 09:36:51

How to scrape headline news, link and image?

Question

2 answers

solution1 1 ACCPTED 2022-11-16 09:23:13

solution2 0 2022-11-16 09:36:51

solution1
1 ACCPTED 2022-11-16 09:23:13

solution2
0 2022-11-16 09:36:51