[英]How to get an attribute value with lxml on html
I have some HTML that I want to parse with lxml using Python. 我有一些要使用Python使用lxml解析的HTML。 There are a number of elements on the page that each represent a poster.
页面上有许多元素,每个元素代表一个海报。 I want to grab each poster's ID, so that I can then scrape a piece of information off the poster's page.
我想获取每个海报的ID,以便随后可以从海报页面上抓取一些信息。 Currently the poster's id is stored in the id attribute, so I want to use lxml to get the value of that attribute.
当前发布者的ID存储在id属性中,因此我想使用lxml获取该属性的值。
For example: 例如:
<div onclick="showDetail(9202)">
<div class="maincard narrower Poster" id="maincard_9202"> </div>
</div>
I want to grab the "maincard_9202" from the id attribute, so that I can then use regex to get the 9202. From there, I can use this value to get directly to the poster's page, since I know that the url redirect pattern goes from 我想从id属性中获取“ maincard_9202”,以便随后可以使用正则表达式来获取9202。从那里,我可以使用该值直接到达发布者的页面,因为我知道url重定向模式从
https://nips.cc/Conferences/2017/Schedule?type=Poster (current page) to https://nips.cc/Conferences/2017/Schedule?showEvent=9202 (poster page) https://nips.cc/Conferences/2017/Schedule?type=Poster (当前页面)到https://nips.cc/Conferences/2017/Schedule?showEvent=9202 (海报页面)
I was trying to use the following code: 我正在尝试使用以下代码:
from lxml import html
import requests
page = requests.get('https://nips.cc/Conferences/2017/Schedule?type=Poster')
tree = html.fromstring(page.content)
paper_numbers = tree.xpath('//div[@onclick]/id/')
but this returns an empty list. 但这会返回一个空列表。
How can I get the attribute value in this case? 在这种情况下,如何获取属性值?
paper_numbers = tree.xpath('//div[@onclick]/div/@id')
print(paper_numbers)
would give you 会给你
['maincard_9202']
It selects the id
attributes of all div
s inside a div
with the onclick
attribute... 它选择的
id
的所有属性div
一个在s div
与onclick
属性...
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.