Using lxml.html, I was able to get the data-pid using fromstring(source).xpath('/html/body/article/section/div[1]/div[2]/p[2]')[0].get('data-pid')
However, it only returns one of them (in this case 4559733570). I recall being able to grab all of them at once, but I don't remember how. Can somebody point me in the right direction?
HTML Code looks like this:
Assuming you care about attributes data-pid
in all p
elements:
>>> fromstring(source).xpath("//p/@data-pi")
shall return what you need.
From your png and xpath query, it seems like all the <p>
elements you are interested in are nested in the same <div>
. The xpath query /html/body/article/section/div[1]/div[2]/p[2]
will return only the second <p>
element in the selected div ( [2]
). If you want all the paragraphs in the div, use /html/body/article/section/div[1]/div[2]/p
.
[ p.get("data-pid") for p in fromstring(source).xpath('/html/body/article/section/div[1]/div[2]/p') ]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.