bs4 python 網頁抓取

Question

我只想從這個特定的div訪問文本。 結構是這樣的：

<div class="edgtf-pli-text"><h4 class="edgtf-pli-title entry-title" itemprop="name">
Crash Landing on You</h4></div>

代碼是：

import requests
from bs4 import BeautifulSoup
page = requests.get('https://kdramaclicks.com/kdrama/romantic-comedy/')
soup = BeautifulSoup(page.content,'html.parser')
names = soup.find_all('div',class_='edgtf-pli-text')
print(names)

我將如何塑造代碼，以便只有文本出現，即“墜毀在你身上？”

我對抓取真的很陌生，所以請幫助我一點，如果有任何用於抓取 wiki 表格的好 api 也推薦我一個

Answer 1

使用get_text()方法提取標簽內的文本。

for name in names:
    print(name.get_text(strip=True))

Crash Landing on You
Meow, The Secret Boy
Seven First Kisses
What’s Wrong with Secretary Kim
Touch Your Heart
The Secret Life of My Secretary
Strong Girl Bong-soon
Suspicious Partner
Secret Garden
She Was Pretty
Shopping King Louis
Oh My Venus
My Love from the Star
My First First Love
Legend of the Blue Sea
The Big Hit
Her Private Life
Beating Again
Emergency Couple
Clean with Passion for Now
Be Melodramatic

Answer 2

import requests
from bs4 import BeautifulSoup


def main(url):
    r = requests.get(url)
    soup = BeautifulSoup(r.content, 'html.parser')
    target = [item.get_text(strip=True) for item in soup.select(
        "h4.edgtf-pli-title.entry-title")]
    print(target)


main("https://kdramaclicks.com/kdrama/romantic-comedy/")

輸出：

['Crash Landing on You', 'Meow, The Secret Boy', 'Seven First Kisses', 'What’sWrong with Secretary Kim', 'Touch Your Heart', 'The Secret Life of My Secretary', 'Strong Girl Bong-soon', 'Suspicious Partner', 'Secret Garden', 'She Was Pretty', 'Shopping King Louis', 'Oh My Venus', 'My Love from the Star', 'My FirstFirst Love', 'Legend of the Blue Sea', 'The Big Hit', 'Her Private Life', 'Beating Again', 'Emergency Couple', 'Clean with Passion for Now', 'Be Melodramatic']

Answer 3

您可以使用 BeautifulSoup 標簽的.text屬性，然后使用.strip()它（刪除每個韓劇名稱中前面的“\\n”（換行符））。

import requests
from bs4 import BeautifulSoup


page = requests.get('https://kdramaclicks.com/kdrama/romantic-comedy/')
soup = BeautifulSoup(page.content,'html.parser')
names = soup.find_all('div',class_='edgtf-pli-text')
for name in names:
    print(name.text.strip())

bs4 python 網頁抓取

問題描述

3 個解決方案

解決方案1
1 2020-09-06 10:18:02

解決方案2
1 2020-09-06 14:44:06

解決方案3
0 2020-09-06 10:19:27

bs4 python 網頁抓取

問題描述

3 個解決方案

解決方案1 1 2020-09-06 10:18:02

解決方案2 1 2020-09-06 14:44:06

解決方案3 0 2020-09-06 10:19:27

解決方案1
1 2020-09-06 10:18:02

解決方案2
1 2020-09-06 14:44:06

解決方案3
0 2020-09-06 10:19:27