简体   繁体   English

无法通过 xpath 找到元素

[英]Cant find element by xpath

I have the following xpaths:我有以下 xpath:

<a class="btn btn-small downloadlink" rel="nofollow" data-
toggle="tooltip" data-format="ico" data-icon-id="1715795" 
href="/icons/1715795/download/ico" data-original-title="Download this 
icon in ICO format for use in Windows."><i class="download-icon"></i><b>
ICO</b></a>

<a class="btn btn-small downloadlink" rel="nofollow" data-
toggle="tooltip" data-format="icns" data-icon-id="1715795" 
href="/icons/1715795/download/icns" data-original-title="Download this 
icon in ICNS format for use in Apple OS X."><i class="download-icon"></i><b>
ICNS</b></a>

(from here: https://www.iconfinder.com/icons/1715795/earth_planet_space_icon#size=128 ) (从这里: https : //www.iconfinder.com/icons/1715795/earth_planet_space_icon#size=128

Using selenium, I want to select the element that corresponds to the xpath that contains:使用 selenium,我想选择与包含以下内容的 xpath 对应的元素:

data-format="icns"

I've tried something like:我试过这样的事情:

driver.find_element_by_xpath('//*[@data-format="icns"]')

but it gives the following error message:但它给出了以下错误消息:

selenium.common.exceptions.NoSuchElementException: Message: no such 
element: Unable to locate element: {"method":"xpath","selector":"//*
[@data-format=icns]"}

Q: How can I select the the second element?问:如何选择第二个元素?

I know I can just copy the xpath from inspection, but this would leave me with a very unstable scraping script.我知道我可以从检查中复制 xpath,但这会给我留下一个非常不稳定的抓取脚本。 Since tiny changes in the page layout could mean that my xpath expression would no longer be valid.由于页面布局的微小变化可能意味着我的 xpath 表达式将不再有效。

Thanks in advance!提前致谢!

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import requests
import lxml.html

url = "https://www.iconfinder.com/icons/1715795/earth_planet_space_icon#size=128"

#if you use selenium, comment this line.
resp = requests.get(url)

#you can replace this line if you use selenium
#source_code = browser.page_source

source_code = resp.text

root = lxml.html.fromstring(source_code)

id_icon = set(root.xpath('//*[@data-format="icns"]//@data-icon-id'))

#just if you want download all the icons

for id in id_icon:
    url = 'https://www.iconfinder.com/icons/{0}/check-download/icns'.format(id)

    local_filename = '{0}.icns'.format(id)
    resp = requests.get(url, stream=True)
    with open(local_filename, 'wb') as f:
        for chunk in resp.iter_content(chunk_size=1024): 
            if chunk:
                f.write(chunk)

    print "downloaded  {0}".format(local_filename)

If you wish to scrap a site and download some object I can recommend use scrapy - it's much better/faster/more reliable for scraping than selenium.如果你想抓取一个站点并下载一些对象,我可以推荐使用scrapy - 它比硒更好/更快/更可靠。

For example:例如:

$ scrapy shell
2017-01-08 23:36:58 [scrapy] INFO: Scrapy 1.2.1 started (bot: scrapybot)
2017-01-08 23:36:58 [scrapy] INFO: Overridden settings: {'LOGSTATS_INTERVAL': 0, 'DUPEFILTER_CLASS': 'scrapy.dupefilters.BaseDupeFilter'}
.... debug info ....
2017-01-08 23:36:58 [scrapy] INFO: Enabled item pipelines:
[]
2017-01-08 23:36:58 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023
[s] Available Scrapy objects:
[s]   scrapy     scrapy module (contains scrapy.Request, scrapy.Selector, etc)
[s]   crawler    <scrapy.crawler.Crawler object at 0x10cfe7cc0>
[s]   item       {}
[s]   settings   <scrapy.settings.Settings object at 0x10cfe7eb8>
[s] Useful shortcuts:
[s]   shelp()           Shell help (print this help)
[s]   fetch(req_or_url) Fetch request (or URL) and update local objects
[s]   view(response)    View response in a browser

# fetch data and prepare all helpers to work
>>> fetch('https://www.iconfinder.com/icons/1715795/earth_planet_space_icon#')

# now object `response` contains result of our request
>>> response
<200 https://www.iconfinder.com/icons/1715795/earth_planet_space_icon>

# let's check links:
>>> response.xpath('//*[@data-format="icns"]')
[<Selector xpath='//*[@data-format="icns"]' data='<a class="btn btn-small downloadlink" re'>, <Selector xpath='//*[@data-format="icns"]' data='<a class="btn btn-small downloadlink"
re'>, <Selector xpath='//*[@data-format="icns"]' data='<a class="btn btn-small downloadlink" re'>, <Selector xpath='//*[@data-format="icns"]' data='<a class="btn btn-small downloadl
ink" re'>, ...... ]

# extract first of them
>>> link = response.xpath('//*[@data-format="icns"]')[0]
>>> link.extract()
'<a class="btn btn-small downloadlink" rel="nofollow" title="Download this icon in ICNS format for use in Apple OS X." data-toggle="tooltip" data-format="icns" data-icon-id="1715795
" href="/icons/1715795/download/icns"><i class="download-icon"></i><b>\n    ICNS</b></a>'

# URL of a link
>>> link.select('@href').extract_first()
'/icons/1715795/download/icns'

# add host and other params to form a full URL
>>> response.urljoin(link.select('@href').extract_first())
'https://www.iconfinder.com/icons/1715795/download/icns'

In fact Scrapy is capable of much-much more.事实上,Scrapy 的功能远不止于此。 For example, it can recursively find and download all links for you.例如,它可以为您递归查找和下载所有链接。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM