通过BeautifulSoup获取属性值

Question

I want to get all data-js attribute values from the content by BeautifulSoup. 我想通过BeautifulSoup从内容中获取所有data-js属性值。

Input: 输入：

<p data-js="1, 2, 3">some text..</p><p data-js="5">some 1 text</p><p data-js="4"> some 2 text. </p>

Output: 输出：

['1, 2, 3', '5', '4']

I've done it with lxml: 我用lxml完成了它：

>>> content = """<p data-js="1, 2, 3">some text..</p><p data-js="5">some 1 text</p><p data-js="4"> some 2 text. </p>"""
>>> import lxml.html as PARSER
>>> root = PARSER.fromstring(content)
>>> root.xpath("//*/@data-js")
['1, 2, 3', '5', '4']

I want the above result via BeautifulSoup. 我想通过BeautifulSoup获得上述结果。

Answer 1

The idea would to find all elements having data-js attributes and collect them in a list: 想法是找到具有data-js 属性的所有元素并将它们收集在列表中：

from bs4 import BeautifulSoup


data = """
<p data-js="1, 2, 3">some text..</p><p data-js="5">some 1 text</p><p data-js="4"> some 2 text. </p>
"""

soup = BeautifulSoup(data)
print [elm['data-js'] for elm in soup.find_all(attrs={"data-js": True})]

Prints ['1, 2, 3', '5', '4'] . 打印['1, 2, 3', '5', '4'] 。

Answer 2

May a faster method with map without list comprehension. 可以使用没有列表理解的map的更快的方法。

from bs4 import BeautifulSoup
d = "..."
# create a soup instance
soup = BeautifulSoup(d)
# find all p-elements containing an data-js attribute
p = soup.find_all('p', attrs={"data-js": True})
# unpack data-js attribute from p-elements and map to new list
print map(lambda x: x['data-js'], p)

http://www.crummy.com/software/BeautifulSoup/bs4/doc/#find-all http://www.crummy.com/software/BeautifulSoup/bs4/doc/#find-all

Answer 3

You can use find_all() for this, but you have to put the attribute name in a dictionary, because it can't be used as a keyword argument by itself. 您可以使用find_all（），但必须将属性名称放在字典中，因为它不能单独用作关键字参数。

html = BeautifulSoup(content)
data = html.find_all(attrs={'data-js': True})

See here for more explanation. 请参阅此处以获取更多解释

通过BeautifulSoup获取属性值

问题描述

3 个解决方案

解决方案1
4 2015-06-12 13:21:39

解决方案2
3 2015-06-12 13:30:59

解决方案3
2 2015-06-12 13:22:13

通过BeautifulSoup获取属性值

问题描述

3 个解决方案

解决方案1 4 2015-06-12 13:21:39

解决方案2 3 2015-06-12 13:30:59

解决方案3 2 2015-06-12 13:22:13

解决方案1
4 2015-06-12 13:21:39

解决方案2
3 2015-06-12 13:30:59

解决方案3
2 2015-06-12 13:22:13