简体   繁体   English

XPath 总是返回空列表

[英]XPath Always Returns Empty List

I am trying to extract the time value from this website .我正在尝试从该网站中提取时间值。

Below is the code that I am using下面是我正在使用的代码

import requests
from lxml import html

page = requests.get('https://beta.nseindia.com/get-quotes/derivatives?symbol=NIFTY&identifier=OPTIDXNIFTY26-12-2019CE12300.00')
tree = html.fromstring(page.content)
test1 = tree.xpath('//*[@id="equity-derivative-op-timeStamp"]/text()')

print(test1)

Result:结果:

[]

How can I get the timestamp value in the "Option Chain" tab of the above page and at that particular xpath?如何在上述页面的“期权链”选项卡中以及在该特定 xpath 中获取时间戳值?

You're getting back an empty result because if you examine the page source of the URL you're fetching, the equity-derivative-op-timeStamp timestamp is empty:你得到一个空的结果,因为如果你检查你正在获取的 URL 的页面源, equity-derivative-op-timeStamp时间戳是空的:

<span id="equity-derivative-op-timeStamp" class="asondate"></span>

That data is populated via Javascript after the page loads.该数据在页面加载后通过 Javascript 填充。 You won't be able to fetch it using the requests module;您将无法使用requests模块获取它; you'll need to use something like selenium that drives a real browser capable of processing javascript.你需要使用像selenium这样的东西来驱动一个能够处理 javascript 的真实浏览器。

As larsks says in his answer正如 larsks 在他的回答中所说

That data is populated via Javascript after the page loads.该数据在页面加载后通过 Javascript 填充。

But the data is loaded as XHRs.但是数据是作为 XHR 加载的。 In Firefox right click on the page, select Inspect Element select Network, select XHR, refresh the page right click on the request of interest and open it in a new tab.在 Firefox 中右键单击页面,选择 Inspect Element 选择 Network,选择 XHR,刷新页面右键单击感兴趣的请求并在新选项卡中打开它。

Doing this I have identified that the pagehttps://beta.nseindia.com/api/option-chain-indices?symbol=NIFTY may be of interest to you.这样做我已经确定页面https://beta.nseindia.com/api/option-chain-indices?symbol=NIFTY可能对您感兴趣。 It is a JSON file.它是一个 JSON 文件。 you can use it like any JSON object:您可以像使用任何 JSON 对象一样使用它:

import requests

headers = {
    'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:71.0) Gecko/20100101 Firefox/71.0'
}

params = (
    ('symbol', 'NIFTY'),
)

response = requests.get('https://beta.nseindia.com/api/option-chain-indices', headers=headers, params=params)
j = response.json()
print(j['records']['timestamp'])

Outputs:输出:

23-Dec-2019 15:30:00

You need to supply a header for this particular request as above.您需要为上述特定请求提供一个标头。 To determine what headers are needed for a particular web-page in Firefox right click on the page, select Inspect Element select Network refresh the page right click on the request you want select Copy then Copy as cURL paste what you copied into https://curl.trillworks.com then use the generated code, if it works remove headers one at a time until you get a minimal set that works.要确定 Firefox 中特定网页需要哪些标头,请右键单击页面,选择检查元素选择网络刷新页面右键单击您想要的请求选择Copy然后Copy as cURL将您复制的内容粘贴到https:// curl.trillworks.com然后使用生成的代码,如果它工作,一次删除一个标题,直到你得到一个最小的集。 In Chrome it is a similar process.在 Chrome 中,这是一个类似的过程。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM