简体   繁体   English

如何在HTML上使用LXML获取属性值

[英]How to get an attribute value with lxml on html

I have some HTML that I want to parse with lxml using Python. 我有一些要使用Python使用lxml解析的HTML。 There are a number of elements on the page that each represent a poster. 页面上有许多元素,每个元素代表一个海报。 I want to grab each poster's ID, so that I can then scrape a piece of information off the poster's page. 我想获取每个海报的ID,以便随后可以从海报页面上抓取一些信息。 Currently the poster's id is stored in the id attribute, so I want to use lxml to get the value of that attribute. 当前发布者的ID存储在id属性中,因此我想使用lxml获取该属性的值。

For example: 例如:

<div onclick="showDetail(9202)">               
    <div class="maincard narrower Poster" id="maincard_9202"> </div>
</div>

I want to grab the "maincard_9202" from the id attribute, so that I can then use regex to get the 9202. From there, I can use this value to get directly to the poster's page, since I know that the url redirect pattern goes from 我想从id属性中获取“ maincard_9202”,以便随后可以使用正则表达式来获取9202。从那里,我可以使用该值直接到达发布者的页面,因为我知道url重定向模式从

https://nips.cc/Conferences/2017/Schedule?type=Poster (current page) to https://nips.cc/Conferences/2017/Schedule?showEvent=9202 (poster page) https://nips.cc/Conferences/2017/Schedule?type=Poster (当前页面)到https://nips.cc/Conferences/2017/Schedule?showEvent=9202 (海报页面)

I was trying to use the following code: 我正在尝试使用以下代码:

from lxml import html
import requests
page = requests.get('https://nips.cc/Conferences/2017/Schedule?type=Poster')
tree = html.fromstring(page.content)
paper_numbers = tree.xpath('//div[@onclick]/id/')

but this returns an empty list. 但这会返回一个空列表。

How can I get the attribute value in this case? 在这种情况下,如何获取属性值?

paper_numbers = tree.xpath('//div[@onclick]/div/@id')
print(paper_numbers)

would give you 会给你

['maincard_9202']

It selects the id attributes of all div s inside a div with the onclick attribute... 它选择的id的所有属性div一个在s divonclick属性...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM