简体   繁体   中英

pythons lxml.html, grab all at once

Using lxml.html, I was able to get the data-pid using fromstring(source).xpath('/html/body/article/section/div[1]/div[2]/p[2]')[0].get('data-pid')

However, it only returns one of them (in this case 4559733570). I recall being able to grab all of them at once, but I don't remember how. Can somebody point me in the right direction?

HTML Code looks like this:

http://i.imgur.com/hn0Jqyi.png

xpath, directly returning all the values

Assuming you care about attributes data-pid in all p elements:

>>> fromstring(source).xpath("//p/@data-pi")

shall return what you need.

From your png and xpath query, it seems like all the <p> elements you are interested in are nested in the same <div> . The xpath query /html/body/article/section/div[1]/div[2]/p[2] will return only the second <p> element in the selected div ( [2] ). If you want all the paragraphs in the div, use /html/body/article/section/div[1]/div[2]/p .

[ p.get("data-pid") for p in fromstring(source).xpath('/html/body/article/section/div[1]/div[2]/p') ]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM