简体   繁体   English

无法使用“请求-HTML”库获取交易价格

[英]Trouble getting the trade-price using “Requests-HTML” library

I've written a script in python to get the price of last trade from a javascript rendered webpage. 我在python中编写了一个脚本,以便从javascript呈现的网页中获取最后一笔交易的价格。 I can get the content If I choose to go with selenium . 如果我选择使用selenium我可以获得内容。 My goal here is not to use any browser simulator like selenium or something because the latest release of Requests-HTML is supposed to have the ability to parse javascript encrypted content. 我的目标是不使用任何浏览器模拟器,如selenium或其他东西,因为最新版本的Requests-HTML应该具有解析javascript加密内容的能力。 However, I am not being able to make a go successfully. 但是,我无法顺利完成任务。 When I run the script, I get the following error. 当我运行脚本时,我收到以下错误。 Any help on this will be highly appreciated. 任何有关这方面的帮助将受到高度赞赏。

Site address : webpage_link 网站地址: webpage_link

The script I've tried with: 我尝试过的脚本:

import requests_html

with requests_html.HTMLSession() as session:
    r = session.get('https://www.gdax.com/trade/LTC-EUR')
    js = r.html.render()
    item = js.find('.MarketInfo_market-num_1lAXs',first=True).text
    print(item)

This is the complete traceback: 这是完整的追溯:

Exception in callback NavigatorWatcher.waitForNavigation.<locals>.watchdog_cb(<Task finishe...> result=None>) at C:\Users\ar\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pyppeteer\navigator_watcher.py:49
handle: <Handle NavigatorWatcher.waitForNavigation.<locals>.watchdog_cb(<Task finishe...> result=None>) at C:\Users\ar\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pyppeteer\navigator_watcher.py:49>
Traceback (most recent call last):
  File "C:\Users\ar\AppData\Local\Programs\Python\Python36-32\lib\asyncio\events.py", line 145, in _run
    self._callback(*self._args)
  File "C:\Users\ar\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pyppeteer\navigator_watcher.py", line 52, in watchdog_cb
    self._timeout)
  File "C:\Users\ar\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pyppeteer\navigator_watcher.py", line 40, in _raise_error
    raise error
concurrent.futures._base.TimeoutError: Navigation Timeout Exceeded: 3000 ms exceeded
Traceback (most recent call last):
  File "C:\Users\ar\AppData\Local\Programs\Python\Python36-32\experiment.py", line 6, in <module>
    item = js.find('.MarketInfo_market-num_1lAXs',first=True).text
AttributeError: 'NoneType' object has no attribute 'find'
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
  File "C:\Users\ar\AppData\Local\Programs\Python\Python36-32\lib\shutil.py", line 387, in _rmtree_unsafe
    os.unlink(fullname)
PermissionError: [WinError 5] Access is denied: 'C:\\Users\\ar\\.pyppeteer\\.dev_profile\\tmp1gng46sw\\CrashpadMetrics-active.pma'

The price I'm after is available on the top of the page which can be visible like this 177.59 EUR Last trade price . 我所能找到的价格在页面顶部可以看到这样的177.59 EUR Last trade price I wish to get 177.59 or whatever the current price is. 我希望得到177.59或当前的价格。

You have several errors. 你有几个错误。 The first is a 'navigation' timeout, showing that the page didn't complete rendering: 第一个是“导航”超时,显示页面未完成渲染:

Exception in callback NavigatorWatcher.waitForNavigation.<locals>.watchdog_cb(<Task finishe...> result=None>) at C:\Users\ar\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pyppeteer\navigator_watcher.py:49
handle: <Handle NavigatorWatcher.waitForNavigation.<locals>.watchdog_cb(<Task finishe...> result=None>) at C:\Users\ar\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pyppeteer\navigator_watcher.py:49>
Traceback (most recent call last):
  File "C:\Users\ar\AppData\Local\Programs\Python\Python36-32\lib\asyncio\events.py", line 145, in _run
    self._callback(*self._args)
  File "C:\Users\ar\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pyppeteer\navigator_watcher.py", line 52, in watchdog_cb
    self._timeout)
  File "C:\Users\ar\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pyppeteer\navigator_watcher.py", line 40, in _raise_error
    raise error
concurrent.futures._base.TimeoutError: Navigation Timeout Exceeded: 3000 ms exceeded

This traceback is not raised in the main thread, your code was not aborted because of this. 主线程中不会引发此回溯,因此您的代码未中止。 Your page may or may not be complete; 您的页面可能已完成,也可能未完成; you may want to set a longer timeout or introduce a sleep cycle for the browser to have time to process AJAX responses. 您可能希望为浏览器设置更长的超时或引入睡眠周期,以便有时间处理AJAX响应。

Next, the response.html.render() element returns None . 接下来, response.html.render()元素返回None It loads the HTML into a headless Chromium browser, leaves JavaScript rendering to that browser, then copies back the page HTML into the response.html datasctructure in place , and nothing needs to be returned. 它将HTML加载到无头Chromium浏览器中,将JavaScript呈现留给该浏览器,然后将页面HTML复制回到适当response.html结构 ,并且不需要返回任何内容。 So js is set to None , not a new HTML instance, causing your next traceback. 所以js设置为None ,而不是新的HTML实例,导致你的下一个回溯。

Use the existing response.html object to search, after rendering: 渲染后,使用现有的 response.html对象进行搜索:

r.html.render()
item = r.html.find('.MarketInfo_market-num_1lAXs', first=True)

There is most likely no such CSS class, because the last 5 characters are generated on each page render, after JSON data is loaded over AJAX. 很可能没有这样的CSS类,因为在通过AJAX加载JSON数据之后,在每个页面渲染上生成最后5个字符。 This makes it hard to use CSS to find the element in question. 这使得很难使用CSS来查找有问题的元素。

Moreover, I found that without a sleep cycle, the browser has no time to fetch AJAX resources and render the information you wanted to load. 此外,我发现没有睡眠周期,浏览器就没有时间获取AJAX资源并呈现您想要加载的信息。 Give it, say, 10 seconds of sleep to do some work before copying back the HTML. 在复制HTML之前,给它做10秒的sleep以完成一些工作。 Set a longer timeout (the default is 8 seconds) if you see network timeouts: 如果看到网络超时,请设置更长的超时(默认为8秒):

r.html.render(timeout=10, sleep=10)

You could set the timeout to 0 too, to remove the timeout and just wait indefinitely until the page has loaded. 您也可以将timeout设置为0 ,以删除超时并无限期地等待页面加载。

Hopefully a future API update also provides features to wait for network activity to cease . 希望未来的API更新还提供等待网络活动停止的功能

You can use the included parse library to find the matching CSS classes: 您可以使用包含的parse来查找匹配的CSS类:

# search for CSS suffixes
suffixes = [r[0] for r in r.html.search_all('MarketInfo_market-num_{:w}')]
for suffix in suffixes:
    # for each suffix, find all matching elements with that class
    items = r.html.find('.MarketInfo_market-num_{}'.format(suffix))
    for item in items:
        print(item.text)

Now we get output produced: 现在我们得到输出:

169.81 EUR
+
1.01 %
18,420 LTC
169.81 EUR
+
1.01 %
18,420 LTC
169.81 EUR
+
1.01 %
18,420 LTC
169.81 EUR
+
1.01 %
18,420 LTC

Your last traceback shows that the Chromium user data path could not be cleaned up. 您的上一次回溯显示无法清除Chromium用户数据路径。 The underlying Pyppeteer library configures the headless Chromium browser with a temporary user data path, and in your case the directory contains some still-locked resource. 底层的Pyppeteer库使用临时用户数据路径配置无头Chromium浏览器,在您的情况下,该目录包含一些仍然锁定的资源。 You can ignore the error, although you may want to try and remove any remaining files in the .pyppeteer folder at a later time. 您可以忽略该错误,但您可能希望稍后尝试删除.pyppeteer文件夹中的所有剩余文件。

Do you need it to go through Requests-HTML? 你需要它来通过Requests-HTML吗? On the day you posted, the repo was 4 days old and in the 3 days that have passed there have been 50 commits. 在您发布的那天,回购邮件已经过了4天,在过去的3天内,有50个提交。 It's not going to be completely stable for some time. 它有一段时间不会完全稳定。

See here: https://github.com/kennethreitz/requests-html/graphs/commit-activity 见这里: https//github.com/kennethreitz/requests-html/graphs/commit-activity

OTOH, there is an API for gdax. OTOH,有一个gdax的API。

https://docs.gdax.com/#market-data https://docs.gdax.com/#market-data

Now if you're dead set on using Py3, there is a python client listed on the GDAX website. 现在,如果您已经开始使用Py3,那么GDAX网站上会列出一个python客户端。 Upfront I'll mention that it's the unofficial client; 我在前面提到它是非官方的客户; however, if you use this you'd be able to quickly and easily get responses from the official GDAX api. 但是,如果你使用它,你将能够快速轻松地从官方GDAX api获得回复。

https://github.com/danpaquin/gdax-python https://github.com/danpaquin/gdax-python

In case you want to use another way by running Selenium web scraping 如果你想通过运行Selenium web scraping来使用另一种方式

from selenium import webdriver
from selenium.webdriver.common.keys import Keys 
from selenium.common.exceptions import TimeoutException


chrome_path = r"C:\Users\Mike\Desktop\chromedriver.exe"    

driver = webdriver.Chrome(chrome_path)

driver.get("https://www.gdax.com/trade/LTC-EUR")

item = driver.find_element_by_xpath('''//span[@class='MarketInfo_market-num_1lAXs']''') 
item = item.text
print item
driver.close()

result:177.60 EUR 结果:177.60欧元

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM