如何選擇具有某些屬性類型的標簽

Question

這是東西

我只想在其他凌亂的html的全部內容中抓取這些標簽

<table bgcolor="FFFFFF" border="0" cellpadding="5" cellspacing="0" align="center">
    <tr>
        <td>
            <a href="./index.html?id=subjective&page=2">
                <img src='https://www.dogdrip.net/?module=file&act=procFileDownload&file_srl=224868098&sid=cc8c0afbb679bef6420500988a756054&module_srl=78' style='max-width:180px;max-height:270px' align='absmiddle' title="cutie cat">
            </a>
        </td>
    </tr>
</table>

我第一次嘗試使用CSS選擇器選擇器

#div_article_contents > tr:nth-child(1) > th:nth-child(1) > table > tbody > tr:nth-child(1) > td > table > tbody > tr > td > a > img

但是soup.select('selector')無效。 它輸出空列表。 我不知道為什么

其次，我嘗試對要爬網的每個標簽都使用特定的樣式，因此嘗試了：

soup.select('img[style = fixedstyle]')

但這是行不通的。 這將是語法錯誤...

我要爬網的只是href鏈接列表和img標題列表

請幫我

Answer 1

如果img標簽具有特定的樣式值，則可以使用您嘗試的內容，只需刪除多余的空格即可：

from bs4 import BeautifulSoup

html='''
<a href='link'>
    <img src='address' style='max-width:222px;max-height:222px' title='owntitle'>
</a>
<a href='link'>
    <img src='address1' style='max-width:222px;max-height:222px' title='owntitle1'>
</a>
<a href='link'>
    <img src='address2' style='max-width:222px;max-height:222px' title='owntitle2'>
</a>
'''

srcs=[]
titles=[]
soup=BeautifulSoup(html,'html.parser')
for img in soup.select('img["style=max-width:222px;max-height:222px"]'):
    srcs.append(img['src'])
    titles.append(img['title'])
print(srcs)
print(titles)

否則，您可以從a標簽開始，然后像下面這樣進入img ：

for a in soup.select('a'):
    srcs.append(a.select_one('img')['src'])
    titles.append(a.select_one('img')['title'])
print(srcs)
print(titles)

如何選擇具有某些屬性類型的標簽

問題描述

1 個解決方案

解決方案1
1 已采納 2019-09-09 04:46:44

如何選擇具有某些屬性類型的標簽

問題描述

1 個解決方案

解決方案1 1 已采納 2019-09-09 04:46:44

解決方案1
1 已采納 2019-09-09 04:46:44