简体   繁体   English

如何使用lxml xpath和python中的请求在文本中提取href

[英]How to extract the href within the text using lxml xpath and requests in python

First of all, I am relatively new to python. 首先,我是python的新手。 I need to extract a link from the text in a web page, I am using lxml with Python 3.5, but i can't figure it out. 我需要从网页中的文本中提取一个链接,我将lxml与Python 3.5结合使用,但我无法弄清楚。 This is what I have so far: 这是我到目前为止的内容:

url = someUrl
page = requests.get(url)
webpage = html.fromstring(page.content)
fulllinks = webpage.xpath('//a/@href')
fulltext = webpage.xpath('//a/text()')


for line in fulltext:
    if line.startswith("SomethingHere"):
    'get the link from SomethingHere and do other stuff'

where "somethingHere" is the text and I want the link from that text (eg www.someweb.com.br/trends ). 其中"somethingHere"是文本,我想要该文本的链接(例如www.someweb.com.br/trends )。

I'm kind of lost here. 我有点迷路了。 Thanks in advance. 提前致谢。

Got what i was looking for. 得到了我想要的东西。 The answer is: 答案是:

webpage.xpath("//a[starts-with(text(),'SomethingHere')]/@href")

Thanks anyway. 不管怎么说,还是要谢谢你。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM