简体   繁体   English

使用lxml抓取数据时的xpath使用情况

[英]xpath usage while scraping data using lxml

I am trying to write a python script to scrape data from a webpage. 我正在尝试编写python脚本以从网页中抓取数据。 However, I am not able to figure out correct usage of xpath to retrieve value. 但是,我无法找出正确使用xpath来获取值的方法。 Please help me in fixing this. 请帮助我解决此问题。

The url in question is https://www.nseindia.com/live_market/dynaContent/live_watch/get_quote/GetQuoteFO.jsp?underlying=NIFTY&instrument=OPTIDX&strike=10400.00&type=CE&expiry=30NOV2017 有问题的网址是https://www.nseindia.com/live_market/dynaContent/live_watch/get_quote/GetQuoteFO.jsp?underlying=NIFTY&instrument=OPTIDX&strike=10400.00&type=CE&expiry=30NOV2017

I am trying to get value of VWAP value ,which at present is 27.16(this value changes every business day.) When is inspect the value in Chrome, I get the following xpath for required value 我正在尝试获取VWAP值的值,目前该值为27.16(此值每个工作日都会更改。)当在Chrome中检查该值时,我得到以下xpath作为所需值

<span id="vwap">27.16</span>

As per online tutorial , I wrote following python script 根据在线教程,我编写了以下python脚本

from lxml import html
import requests
page = requests.get('https://www.nseindia.com/live_market/dynaContent/live_watch/get_quote/GetQuoteFO.jsp?underlying=NIFTY&instrument=OPTIDX&strike=10400.00&type=CE&expiry=30NOV2017')
tree = html.fromstring(page.content)
vwap = tree.xpath('//span[@id="vwap"]/text()')
print(vwap)

But when i execute this command, I get following output 但是当我执行此命令时,我得到以下输出

[]

instead of 代替

27.16

I have also tried replacing xpath line to following as per some other answer on stackoverflow, but still I am not getting the correct output. 我也尝试按照x在stackoverflow上的其他答案替换xpath行,但是仍然无法获得正确的输出。

vwap = tree.xpath('//*[@id="vwap"]/text()')

Please let me know what to put inside xpath so that I get correct value inside vwap variable. 请让我知道在xpath中放入什么,以便在vwap变量中获得正确的值。

Any other solutions(other than lxml) are also welcome. 也欢迎使用其他解决方案(lxml除外)。

If to check page source as it initially appears required node will look like 如果在最初出现时要检查页面源,则所需节点将如下所示

<li><a style="color: #000000;" title="VWAP">VWAP</a> <span id="vwap"></span></li>

while this is how it appears after JavaScript executed 这是执行JavaScript后的样子

<li><a style="color: #000000;" title="VWAP">VWAP</a> <span id="vwap">27.16</span></li>

Note that there is no text content in first HTML sample 请注意,第一个HTML示例中没有文本内容

It seem that values comes from below node 值似乎来自节点以下

<div id="responseDiv" style="display:none">
{"valid":"true","isinCode":null,"lastUpdateTime":"29-NOV-2017 15:30:30","ocLink":"\/marketinfo\/sym_map\/symbolMapping.jsp?symbol=NIFTY&instrument=-&date=-&segmentLink=17&symbolCount=2","tradedDate":"29NOV2017","data":[{"change":"-17.80","sellPrice1":"13.80","buyQuantity3":"450","sellPrice2":"13.85","buyQuantity4":"150","buyQuantity1":"13,725","ltp":"-243019.52","buyQuantity2":"6,225","sellPrice5":"14.00","sellPrice3":"13.90","buyQuantity5":"450","sellPrice4":"13.95","underlying":"NIFTY","bestSell":"-2,41,672.50","annualisedVolatility":"9.44","optionType":"CE","prevClose":"31.10","pChange":"-57.23","lastPrice":"13.30","lowPrice":"11.00","strikePrice":"10400.00","premiumTurnover":"11,707.33","numberOfContractsTraded":"5,74,734","underlyingValue":"10,361.30","openInterest":"58,96,350","impliedVolatility":"12.73","vwap":"27.16","totalBuyQuantity":"10,49,850","openPrice":"35.10","closePrice":"17.85","bestBuy":"-2,43,852.25","changeinOpenInterest":"1,60,800","clientWisePositionLimits":"30517526","totalSellQuantity":"11,07,825","dailyVolatility":"0.49","sellQuantity5":"19,800","marketLot":"75","expiryDate":"30NOV2017","marketWidePositionLimits":"-","sellQuantity2":"75","sellQuantity1":"3,825","buyPrice1":"13.00","sellQuantity4":"900","buyPrice2":"12.90","sellQuantity3":"2,025","buyPrice4":"12.75","buyPrice3":"12.80","buyPrice5":"12.65","turnoverinRsLakhs":"44,94,632.53","pchangeinOpenInterest":"2.80","settlementPrice":"-","instrumentType":"OPTIDX","highPrice":"40.85"}],"companyName":"Nifty 50","eqLink":""}
</div>

so the code that you might need is 所以您可能需要的代码是

import json

vwap = json.loads(tree.xpath('//div[@id="responseDiv"]/text()')[0].strip())['data'][0]['vwap']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM