如何在HTML上使用LXML获取属性值

Question

I have some HTML that I want to parse with lxml using Python. 我有一些要使用Python使用lxml解析的HTML。 There are a number of elements on the page that each represent a poster. 页面上有许多元素，每个元素代表一个海报。 I want to grab each poster's ID, so that I can then scrape a piece of information off the poster's page. 我想获取每个海报的ID，以便随后可以从海报页面上抓取一些信息。 Currently the poster's id is stored in the id attribute, so I want to use lxml to get the value of that attribute. 当前发布者的ID存储在id属性中，因此我想使用lxml获取该属性的值。

For example: 例如：

<div onclick="showDetail(9202)">               
    <div class="maincard narrower Poster" id="maincard_9202"> </div>
</div>

I want to grab the "maincard_9202" from the id attribute, so that I can then use regex to get the 9202. From there, I can use this value to get directly to the poster's page, since I know that the url redirect pattern goes from 我想从id属性中获取“ maincard_9202”，以便随后可以使用正则表达式来获取9202。从那里，我可以使用该值直接到达发布者的页面，因为我知道url重定向模式从

https://nips.cc/Conferences/2017/Schedule?type=Poster (current page) to https://nips.cc/Conferences/2017/Schedule?showEvent=9202 (poster page) https://nips.cc/Conferences/2017/Schedule?type=Poster （当前页面）到https://nips.cc/Conferences/2017/Schedule?showEvent=9202 （海报页面）

I was trying to use the following code: 我正在尝试使用以下代码：

from lxml import html
import requests
page = requests.get('https://nips.cc/Conferences/2017/Schedule?type=Poster')
tree = html.fromstring(page.content)
paper_numbers = tree.xpath('//div[@onclick]/id/')

but this returns an empty list. 但这会返回一个空列表。

How can I get the attribute value in this case? 在这种情况下，如何获取属性值？

Answer 1

paper_numbers = tree.xpath('//div[@onclick]/div/@id')
print(paper_numbers)

would give you 会给你

['maincard_9202']

It selects the id attributes of all div s inside a div with the onclick attribute... 它选择的id的所有属性div一个在s div与onclick属性...

如何在HTML上使用LXML获取属性值

问题描述

1 个解决方案

解决方案1
3 已采纳 2017-12-12 05:59:39

如何在HTML上使用LXML获取属性值

问题描述

1 个解决方案

解决方案1 3 已采纳 2017-12-12 05:59:39

解决方案1
3 已采纳 2017-12-12 05:59:39