简体   繁体   English

如何获得具有相同类名称和属性的特定项目

[英]How to get specific item having same class name and attributes

How can I get the specific item with same Class name and attributes? 如何获得具有相同类名称和属性的特定项目?

I need to get these 3 items 我需要拿这3件

April 14, 2013 2013年4月14日

580 580

Fort Pierce, FL 佛罗里达皮尔斯堡

<dl class="pairsJustified">
<dt>Joined:</dt>
<dd>Apr 14, 2013</dd>
</dl>
<dl class="pairsJustified">
<dt>Messages:</dt>
<dd><a href="search/member?user_id=13302" class="concealed" 
rel="nofollow">580</a></dd>
</dl>

<dl class="pairsJustified">
<dt>Location:</dt>
<dd>
<a href="misc/location-info?location=Fort+Pierce%2C+FL" target="_blank" 
rel="nofollow noreferrer" itemprop="address" class="concealed">Fort 
Pierce, FL</a>

Using they lie under the <dd> tag, using .find_all() : 使用它们位于<dd>标记下,使用.find_all()

from bs4 import BeautifulSoup

test = '''<dl class="pairsJustified">
<dt>Joined:</dt>
<dd>Apr 14, 2013</dd>
</dl>
<dl class="pairsJustified">
<dt>Messages:</dt>
<dd><a href="search/member?user_id=13302" class="concealed" 
rel="nofollow">580</a></dd>
</dl>

<dl class="pairsJustified">
<dt>Location:</dt>
<dd>
<a href="misc/location-info?location=Fort+Pierce%2C+FL" target="_blank" 
rel="nofollow noreferrer" itemprop="address" class="concealed">Fort Pierce, FL</a>'''

soup = BeautifulSoup(test, 'html.parser')
data = soup.find_all("dd")
for d in data:
    print(d.text.strip())

OUTPUT : 输出

Apr 14, 2013
580
Fort Pierce, FL

this is a good starting point: 这是一个很好的起点:

In [18]: for a in response.css('.extraUserInfo'):
    ...:     print(a.css('*::text').extract())
    ...:     print('\n\n\n')
    ...:     
['\n', '\n', '\n', '\n']  # <--this (and other outputs like this) is because there is an extra `extraUserInfo` class block above the desired info block if the user has a user group picture/avatar below their username




['\n', '\n', 'Joined:', '\n', 'Mar 24, 2013', '\n', '\n', '\n', 'Messages:', '\n', '6,747', '\n', '\n']




['\n', '\n', '\n', '\n']




['\n', '\n', 'Joined:', '\n', 'Mar 24, 2013', '\n', '\n', '\n', 'Messages:', '\n', '6,747', '\n', '\n']




['\n', '\n', 'Joined:', '\n', 'Apr 14, 2013', '\n', '\n', '\n', 'Messages:', '\n', '580', '\n', '\n', '\n', 'Location:', '\n', '\n', 'Fort Pierce, FL', '\n', '\n', '\n']




['\n', '\n', 'Joined:', '\n', 'Oct 20, 2012', '\n', '\n', '\n', 'Messages:', '\n', '2,476', '\n', '\n', '\n', 'Location:', '\n', '\n', 'Philadelphia, PA', '\n', '\n', '\n']




['\n', '\n', 'Joined:', '\n', 'Dec 11, 2012', '\n', '\n', '\n', 'Messages:', '\n', '2,938', '\n', '\n', '\n', 'Location:', '\n', '\n', 'Colorado', '\n', '\n', '\n']




['\n', '\n', 'Joined:', '\n', 'Sep 30, 2016', '\n', '\n', '\n', 'Messages:', '\n', '833', '\n', '\n', '\n', 'Location:', '\n', '\n', 'Indiana', '\n', '\n', '\n']


...

There are many ways to approach this. 有很多方法可以解决此问题。 A little fiddling around will get the data formatted to your liking. 稍微摆弄一下即可将数据格式化为您喜欢的格式。 The approach above is only a good starting point because there are many lines with only newline character lists as outputs, thats because (it seems) that user info blocks where the user has a user-group image (like tesla of arizona) then the extraUserInfo class is also used to group that block of html. 上面的方法只是一个很好的起点,因为有很多行仅使用换行符列表作为输出,这是因为(看来)用户信息会阻止用户拥有用户组图像(例如亚利桑那州的特斯拉)的位置,然后是extraUserInfo类也用于对html块进行分组。 There will be better ways to group this... 会有更好的方法将其分组...

Basically response.css('.extraUserInfo') will aggregate all blocks with class extraUserInfo which seems to be the blocks holding the user info you're looking for. 基本上,response.css('。extraUserInfo')将聚集具有extraUserInfo类的所有块,这似乎是保存您要查找的用户信息的块。 From there extract all underlying text with the ::text pseudo selector and parse the arrays. 使用::text伪选择器从那里提取所有基础文本,并解析数组。

There is definitely a better way to approach this if you carefully look at the html structure so you are extracting it in a way that leaves you less processing work afterwards but this should get you on the right track. 如果仔细查看html结构,肯定有更好的方法来解决此问题,因此您以某种方式提取它会减少以后的处理工作,但这应该可以使您走上正确的轨道。 CSS selectors or xpath documentation should be great help. CSS选择器或xpath文档应该有很大的帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在 Python selenium 中,如何从具有相同名称的类中获取返回文本? - In Python selenium, how to get return texts from class having same name? 如何解析 python 中具有相同 class 名称的网站的多个属性? - How to parse several attributes of website with same class name in python? 如何获取从同一父类扩展的类中的属性 - How to get attributes in class that extends from the same parent class 如何同时通过类名和特定属性名定位元素 - How to locate element by class name and specific attribute name at the same time 如何从用户那里获取类的名称,然后创建一个具有相同名称的类 - How to get the name of a class from a user and then create a class with that same name 从与类项关联的字符串中获取类项的属性 - Get attributes of class item from string associated with class item 如何获取具有包含特定文本的类或ID的抓取DIV - How to get scrape DIV having Class or ID containing specific text 如何在具有类名的 div 中获取第一个 ul - How to get the first ul inside div having a class name 具有相同类名的抓取内容 - Scrapy scrape content having same class name 具有相同 class 的多个 span 标签得到特定的 webscraping with beautiful soup - Having multiple span tags with the same class get the specific one webscraping with beautiful soup
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM