bs4 python web scraping

Question

I just want to access the text only from this particular div . The structure goes like this:

<div class="edgtf-pli-text"><h4 class="edgtf-pli-title entry-title" itemprop="name">
Crash Landing on You</h4></div>

and the code is :

import requests
from bs4 import BeautifulSoup
page = requests.get('https://kdramaclicks.com/kdrama/romantic-comedy/')
soup = BeautifulSoup(page.content,'html.parser')
names = soup.find_all('div',class_='edgtf-pli-text')
print(names)

How would I mold the code so that only text comes out, ie "Crash Landing on You?"

I'm really new to scraping so pls help me out a bit, and if there's any good api for scraping wiki tables also recommend me one

Answer 1

Useget_text() method to extract text inside a tag.

for name in names:
    print(name.get_text(strip=True))

Crash Landing on You
Meow, The Secret Boy
Seven First Kisses
What’s Wrong with Secretary Kim
Touch Your Heart
The Secret Life of My Secretary
Strong Girl Bong-soon
Suspicious Partner
Secret Garden
She Was Pretty
Shopping King Louis
Oh My Venus
My Love from the Star
My First First Love
Legend of the Blue Sea
The Big Hit
Her Private Life
Beating Again
Emergency Couple
Clean with Passion for Now
Be Melodramatic

Answer 2

import requests
from bs4 import BeautifulSoup


def main(url):
    r = requests.get(url)
    soup = BeautifulSoup(r.content, 'html.parser')
    target = [item.get_text(strip=True) for item in soup.select(
        "h4.edgtf-pli-title.entry-title")]
    print(target)


main("https://kdramaclicks.com/kdrama/romantic-comedy/")

Output:

['Crash Landing on You', 'Meow, The Secret Boy', 'Seven First Kisses', 'What’sWrong with Secretary Kim', 'Touch Your Heart', 'The Secret Life of My Secretary', 'Strong Girl Bong-soon', 'Suspicious Partner', 'Secret Garden', 'She Was Pretty', 'Shopping King Louis', 'Oh My Venus', 'My Love from the Star', 'My FirstFirst Love', 'Legend of the Blue Sea', 'The Big Hit', 'Her Private Life', 'Beating Again', 'Emergency Couple', 'Clean with Passion for Now', 'Be Melodramatic']

Answer 3

You can use .text attribute of a BeautifulSoup tag, and then .strip() it (to remove the preceding "\\n" (new-line character) in every Korean-drama name).

import requests
from bs4 import BeautifulSoup


page = requests.get('https://kdramaclicks.com/kdrama/romantic-comedy/')
soup = BeautifulSoup(page.content,'html.parser')
names = soup.find_all('div',class_='edgtf-pli-text')
for name in names:
    print(name.text.strip())

bs4 python web scraping

Question

3 answers

solution1
1 2020-09-06 10:18:02

solution2
1 2020-09-06 14:44:06

solution3
0 2020-09-06 10:19:27

bs4 python web scraping

Question

3 answers

solution1 1 2020-09-06 10:18:02

solution2 1 2020-09-06 14:44:06

solution3 0 2020-09-06 10:19:27

solution1
1 2020-09-06 10:18:02

solution2
1 2020-09-06 14:44:06

solution3
0 2020-09-06 10:19:27