如何獲得具有相同類名稱和屬性的特定項目

Question

如何獲得具有相同類名稱和屬性的特定項目？

我需要拿這3件

2013年4月14日

580

佛羅里達皮爾斯堡

<dl class="pairsJustified">
<dt>Joined:</dt>
<dd>Apr 14, 2013</dd>
</dl>
<dl class="pairsJustified">
<dt>Messages:</dt>
<dd><a href="search/member?user_id=13302" class="concealed" 
rel="nofollow">580</a></dd>
</dl>

<dl class="pairsJustified">
<dt>Location:</dt>
<dd>
<a href="misc/location-info?location=Fort+Pierce%2C+FL" target="_blank" 
rel="nofollow noreferrer" itemprop="address" class="concealed">Fort 
Pierce, FL</a>

Answer 1

使用它們位於<dd>標記下，使用.find_all() ：

from bs4 import BeautifulSoup

test = '''<dl class="pairsJustified">
<dt>Joined:</dt>
<dd>Apr 14, 2013</dd>
</dl>
<dl class="pairsJustified">
<dt>Messages:</dt>
<dd><a href="search/member?user_id=13302" class="concealed" 
rel="nofollow">580</a></dd>
</dl>

<dl class="pairsJustified">
<dt>Location:</dt>
<dd>
<a href="misc/location-info?location=Fort+Pierce%2C+FL" target="_blank" 
rel="nofollow noreferrer" itemprop="address" class="concealed">Fort Pierce, FL</a>'''

soup = BeautifulSoup(test, 'html.parser')
data = soup.find_all("dd")
for d in data:
    print(d.text.strip())

輸出：

Apr 14, 2013
580
Fort Pierce, FL

Answer 2

這是一個很好的起點：

In [18]: for a in response.css('.extraUserInfo'):
    ...:     print(a.css('*::text').extract())
    ...:     print('\n\n\n')
    ...:     
['\n', '\n', '\n', '\n']  # <--this (and other outputs like this) is because there is an extra `extraUserInfo` class block above the desired info block if the user has a user group picture/avatar below their username




['\n', '\n', 'Joined:', '\n', 'Mar 24, 2013', '\n', '\n', '\n', 'Messages:', '\n', '6,747', '\n', '\n']




['\n', '\n', '\n', '\n']




['\n', '\n', 'Joined:', '\n', 'Mar 24, 2013', '\n', '\n', '\n', 'Messages:', '\n', '6,747', '\n', '\n']




['\n', '\n', 'Joined:', '\n', 'Apr 14, 2013', '\n', '\n', '\n', 'Messages:', '\n', '580', '\n', '\n', '\n', 'Location:', '\n', '\n', 'Fort Pierce, FL', '\n', '\n', '\n']




['\n', '\n', 'Joined:', '\n', 'Oct 20, 2012', '\n', '\n', '\n', 'Messages:', '\n', '2,476', '\n', '\n', '\n', 'Location:', '\n', '\n', 'Philadelphia, PA', '\n', '\n', '\n']




['\n', '\n', 'Joined:', '\n', 'Dec 11, 2012', '\n', '\n', '\n', 'Messages:', '\n', '2,938', '\n', '\n', '\n', 'Location:', '\n', '\n', 'Colorado', '\n', '\n', '\n']




['\n', '\n', 'Joined:', '\n', 'Sep 30, 2016', '\n', '\n', '\n', 'Messages:', '\n', '833', '\n', '\n', '\n', 'Location:', '\n', '\n', 'Indiana', '\n', '\n', '\n']


...

有很多方法可以解決此問題。 稍微擺弄一下即可將數據格式化為您喜歡的格式。 上面的方法只是一個很好的起點，因為有很多行僅使用換行符列表作為輸出，這是因為（看來）用戶信息會阻止用戶擁有用戶組圖像（例如亞利桑那州的特斯拉）的位置，然后是extraUserInfo類也用於對html塊進行分組。 會有更好的方法將其分組...

基本上，response.css（'。extraUserInfo'）將聚集具有extraUserInfo類的所有塊，這似乎是保存您要查找的用戶信息的塊。 使用::text偽選擇器從那里提取所有基礎文本，並解析數組。

如果仔細查看html結構，肯定有更好的方法來解決此問題，因此您以某種方式提取它會減少以后的處理工作，但這應該可以使您走上正確的軌道。 CSS選擇器或xpath文檔應該有很大的幫助。

如何獲得具有相同類名稱和屬性的特定項目

問題描述

2 個解決方案

解決方案1
1 2019-04-23 15:14:52

解決方案2
0 已采納 2019-04-23 15:46:04

如何獲得具有相同類名稱和屬性的特定項目

問題描述

2 個解決方案

解決方案1 1 2019-04-23 15:14:52

解決方案2 0 已采納 2019-04-23 15:46:04

解決方案1
1 2019-04-23 15:14:52

解決方案2
0 已采納 2019-04-23 15:46:04