繁体   English   中英

使用lxml抓取数据时的xpath使用情况

[英]xpath usage while scraping data using lxml

我正在尝试编写python脚本以从网页中抓取数据。 但是,我无法找出正确使用xpath来获取值的方法。 请帮助我解决此问题。

有问题的网址是https://www.nseindia.com/live_market/dynaContent/live_watch/get_quote/GetQuoteFO.jsp?underlying=NIFTY&instrument=OPTIDX&strike=10400.00&type=CE&expiry=30NOV2017

我正在尝试获取VWAP值的值,目前该值为27.16(此值每个工作日都会更改。)当在Chrome中检查该值时,我得到以下xpath作为所需值

<span id="vwap">27.16</span>

根据在线教程,我编写了以下python脚本

from lxml import html
import requests
page = requests.get('https://www.nseindia.com/live_market/dynaContent/live_watch/get_quote/GetQuoteFO.jsp?underlying=NIFTY&instrument=OPTIDX&strike=10400.00&type=CE&expiry=30NOV2017')
tree = html.fromstring(page.content)
vwap = tree.xpath('//span[@id="vwap"]/text()')
print(vwap)

但是当我执行此命令时,我得到以下输出

[]

代替

27.16

我也尝试按照x在stackoverflow上的其他答案替换xpath行,但是仍然无法获得正确的输出。

vwap = tree.xpath('//*[@id="vwap"]/text()')

请让我知道在xpath中放入什么,以便在vwap变量中获得正确的值。

也欢迎使用其他解决方案(lxml除外)。

如果在最初出现时要检查页面源,则所需节点将如下所示

<li><a style="color: #000000;" title="VWAP">VWAP</a> <span id="vwap"></span></li>

这是执行JavaScript后的样子

<li><a style="color: #000000;" title="VWAP">VWAP</a> <span id="vwap">27.16</span></li>

请注意,第一个HTML示例中没有文本内容

值似乎来自节点以下

<div id="responseDiv" style="display:none">
{"valid":"true","isinCode":null,"lastUpdateTime":"29-NOV-2017 15:30:30","ocLink":"\/marketinfo\/sym_map\/symbolMapping.jsp?symbol=NIFTY&instrument=-&date=-&segmentLink=17&symbolCount=2","tradedDate":"29NOV2017","data":[{"change":"-17.80","sellPrice1":"13.80","buyQuantity3":"450","sellPrice2":"13.85","buyQuantity4":"150","buyQuantity1":"13,725","ltp":"-243019.52","buyQuantity2":"6,225","sellPrice5":"14.00","sellPrice3":"13.90","buyQuantity5":"450","sellPrice4":"13.95","underlying":"NIFTY","bestSell":"-2,41,672.50","annualisedVolatility":"9.44","optionType":"CE","prevClose":"31.10","pChange":"-57.23","lastPrice":"13.30","lowPrice":"11.00","strikePrice":"10400.00","premiumTurnover":"11,707.33","numberOfContractsTraded":"5,74,734","underlyingValue":"10,361.30","openInterest":"58,96,350","impliedVolatility":"12.73","vwap":"27.16","totalBuyQuantity":"10,49,850","openPrice":"35.10","closePrice":"17.85","bestBuy":"-2,43,852.25","changeinOpenInterest":"1,60,800","clientWisePositionLimits":"30517526","totalSellQuantity":"11,07,825","dailyVolatility":"0.49","sellQuantity5":"19,800","marketLot":"75","expiryDate":"30NOV2017","marketWidePositionLimits":"-","sellQuantity2":"75","sellQuantity1":"3,825","buyPrice1":"13.00","sellQuantity4":"900","buyPrice2":"12.90","sellQuantity3":"2,025","buyPrice4":"12.75","buyPrice3":"12.80","buyPrice5":"12.65","turnoverinRsLakhs":"44,94,632.53","pchangeinOpenInterest":"2.80","settlementPrice":"-","instrumentType":"OPTIDX","highPrice":"40.85"}],"companyName":"Nifty 50","eqLink":""}
</div>

所以您可能需要的代码是

import json

vwap = json.loads(tree.xpath('//div[@id="responseDiv"]/text()')[0].strip())['data'][0]['vwap']

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM