简体   繁体   English

R中的数据网页抓取

[英]Data web scraping in R

Am using R and rvest for web data scraping from www.nseindia.com. 我正在使用R和rvest从www.nseindia.com抓取Web数据。 For the first time am able to download the data but after that the following error message comes... 第一次能够下载数据,但是之后出现以下错误消息...

Error in UseMethod("xml_find_all") : no applicable method for 'xml_find_all' applied to an object of class "character" UseMethod(“ xml_find_all”)中的错误:没有适用于“ xml_find_all”的适用方法应用于“字符”类的对象

Am trying to get the first row of index future 我正在尝试获得指数期货的第一行

My code is as follows 我的代码如下

    library("rvest")

    website_nifty_future_live<- read_html("https://www.nseindia.com/live_market/dynaContent/live_watch/fomwatchsymbol.jsp?key=NIFTY&Fut_Opt=Futures")

    nifty_spot<- website_nifty_future_live %>%
      + html_nodes(".alt:nth-child(2) td:nth-child(13)") %>%
       + html_text()
    nifty_spot<-as.numeric(gsub(",","",nifty_spot))

The error is most likely due to the "+" signs at the beginning of your code - I didn't get this error when removing them. 该错误很可能是由于代码开头的“ +”号引起的-删除它们时我没有收到此错误。

I recommend to read the full table as data.frame using the following code: 我建议使用以下代码将整个表读取为data.frame:

library("rvest")

url_nifty <- "https://www.nseindia.com/live_market/dynaContent/live_watch/fomwatchsymbol.jsp?key=NIFTY&Fut_Opt=Futures"
website_nifty_future_live<- read_html(url_nifty)

nifty_spot<- website_nifty_future_live %>%
   html_nodes("#tab26Content > table:nth-child(1)") %>%
   html_table(header = NA, trim = TRUE, fill = FALSE, dec = ".") %>%
   as.data.frame()

It is then of course quite easy to get the first row incl. 那么当然很容易获得第一行的含税。 headers, eg with 标头,例如

nifty_spot[1, ]
     Instrument Underlying Expiry.Date Option.Type Strike.Price Open.Price High.Price Low.Price Prev..Close Last.Price Volume Turnover.lacs.
1 Index Futures      NIFTY   28SEP2017           -            -  10,105.00  10,144.70 10,078.00   10,107.90  10,096.90 94,799    7,18,943.53
  Underlying..Value
1           10079.3

Hope it helps! 希望能帮助到你!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM