简体   繁体   中英

how do i get the next tag

I am trying to get the headlines that are in between a class. the headlines are wrapped around the h2 tag. headlines come after the tag.

from bs4 import BeautifulSoup
import requests
r = requests.get("https://www.dailypost.ng/hot-news")
soup = BeautifulSoup(r.content, "html.parser")
mydivs = soup.findAll("span", {"class": "mvp-cd-date left relative"})
mytags = mydivs.findNext('h2')
for tag in mytags:
    print(tag.text.strip())

soup.findAll() returns a list (or None ), so you cannot call findNext() on it. However, you can iterate the tags and call find_next() on each tag separately:

import requests
from bs4 import BeautifulSoup

r = requests.get("https://www.dailypost.ng/hot-news")
soup = BeautifulSoup(r.content, "html.parser")
mydivs = soup.findAll("span", {"class": "mvp-cd-date left relative"})
for tag in mydivs:
    print(tag.find_next('h2').get_text(strip=True))

Prints:

BREAKING: Another federal lawmaker dies in Dubai hospital
Cross-Over Night: Enugu Govt bans burning of tyres on roads
Dadiyata: DSS breaks silence as Nigerian govt critic remains missing
CAC: Nigerian govt appoints new Acting Registrar-General
What Buhari told me – Dabiri-Erewa
What soldiers should expect in 2020 – Buratai
Only earthquake can erase Amosun’s legacies in Ogun – Akinlade
Civil War: Militia leader sentenced to 20yrs in prison
2020: Prophet Omale releases prophecies on Buhari, Aisha, Kyari, govs, coup plot
BREAKING: EFCC arrests Shehu Sani
Armed Forces Day: Yobe Governor Buni, donates N40 million for emblem appeal fund
Zamfara govt bans illegal gathering in the state
Agbenu Kacholalo: Colours of culture at Idoma International Carnival 2019 [PHOTOS]
Men of God are too fearful, weak to challenge government activities
2020: Peter Obi sends message to Nigerians
TETFUND: EFCC, ICPC asked to probe agency over alleged corruption
Two inmates regain freedom from Uyo prison
Buhari meets President of AfDB, Adeshina at Aso Rock
New Kogi CP resumes office, promises crime free state
Nothing stops you from paying N30,000 minimum wage to workers – APC challenges Makinde

EDIT: This script will scrape headlines from several pages:

import requests
from bs4 import BeautifulSoup

url = 'https://dailypost.ng/hot-news/page/{}/'

for page in range(1, 5):    # <-- change how many pages do you want
    print('Page no.{}'.format(page))
    soup = BeautifulSoup(requests.get(url.format(page)).content, "html.parser")
    mydivs = soup.findAll("span", {"class": "mvp-cd-date left relative"})
    for tag in mydivs:
        print(tag.find_next('h2').get_text(strip=True))
    print('-' * 80)

Try replacing the last 3 lines with:

for div in mydivs:
    mytags = div.findNext('h2')
    for tag in mytags:
        print(tag.strip())

You must iterate through mydivs to use findNext()

mydivs is a list of web elements. findNext only applies to a single web element. You must iterate through the divs and run findNext on each of them.

Just add this line

for div in mydivs:

and put it before

mytags = div.findNext('h2')

Here is the full code for your working program:

from bs4 import BeautifulSoup
import requests
r = requests.get("https://www.dailypost.ng/hot-news")
soup = BeautifulSoup(r.content, "html.parser")
mydivs = soup.findAll("span", {"class": "mvp-cd-date left relative"})
for div in mydivs:
    mytags = div.findNext('h2')
    for tag in mytags:
        print(tag.strip())

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM