Python BS4 find（）find_all（）返回空列表

Question

嗨，我正在嘗試抓取網站https://www.dawn.com/pakistan，但是python find（）find_all（）方法返回空列表，我嘗試了html5.parser，html5lib和lxml仍然沒有運氣。 我嘗試抓取的類在源代碼以及湯對象中都存在，但是事情似乎不起作用，任何幫助將不勝感激，謝謝！

碼：

from bs4 import BeautifulSoup 

import lxml

import html5lib

import urllib.request

url1 = 'https://www.dawn.com/pakistan'


req = urllib.request.Request(
    url1, 
    data=None, 
    headers=
{
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'
}
                        )
url1UrlContent=urllib.request.urlopen(req).read()
soup1=BeautifulSoup(url1UrlContent,'lxml')

url1Section1=soup1.find_all('h2', class_='story__title-size-five-text-black- 
font--playfair-display')
print(url1Section1)

Answer 1

您的也應該工作正常（我使用了不同的語法）。 但是，您所擁有的字符串不匹配。

您具有： 'story__title-size-five-text-black- font--playfair-display'

而且我有： 'story__title size-five text-black font--playfair-display '這是一個非常小的差異

更換：

url1Section1=soup1.find_all('h2', class_='story__title-size-five-text-black- font--playfair-display')

有：

url1Section1=soup1.find_all('h2', {'class':'story__title size-five text-black font--playfair-display '})

看看是否有幫助

Answer 2

我認為您不能傳遞這樣的復合類名稱。 我使用這些是復合類名。 我已經使用css選擇器作為一種更快的檢索方法。 化合物用“。”填充。

如果您在標題后面，可以使用略有不同的選擇器組合

import requests
from bs4 import BeautifulSoup

url= 'https://www.dawn.com/pakistan'
res = requests.get(url)
soup = BeautifulSoup(res.content, "lxml")
items = [item.text.strip() for item in soup.select('h2[data-layout=story] a')]
print(items)

要限制在左側，您可以使用：

items = [item.text.strip() for item in soup.select('.story__title.size-five.text-black.font--playfair-display a' )]

更廣泛地，

items = [item.text.strip() for item in soup.select('article [data-layout=story]')]

根據您的評論：

items = [item.text.strip() for item in soup.select('.col-sm-6.col-12')]

Python BS4 find（）find_all（）返回空列表

問題描述

2 個解決方案

解決方案1
1 2018-12-12 14:20:29

解決方案2
0 2018-12-12 14:30:10

Python BS4 find（）find_all（）返回空列表

問題描述

2 個解決方案

解決方案1 1 2018-12-12 14:20:29

解決方案2 0 2018-12-12 14:30:10

解決方案1
1 2018-12-12 14:20:29

解決方案2
0 2018-12-12 14:30:10