简体   繁体   中英

Data web scraping in R

Am using R and rvest for web data scraping from www.nseindia.com. For the first time am able to download the data but after that the following error message comes...

Error in UseMethod("xml_find_all") : no applicable method for 'xml_find_all' applied to an object of class "character"

Am trying to get the first row of index future

My code is as follows

    library("rvest")

    website_nifty_future_live<- read_html("https://www.nseindia.com/live_market/dynaContent/live_watch/fomwatchsymbol.jsp?key=NIFTY&Fut_Opt=Futures")

    nifty_spot<- website_nifty_future_live %>%
      + html_nodes(".alt:nth-child(2) td:nth-child(13)") %>%
       + html_text()
    nifty_spot<-as.numeric(gsub(",","",nifty_spot))

The error is most likely due to the "+" signs at the beginning of your code - I didn't get this error when removing them.

I recommend to read the full table as data.frame using the following code:

library("rvest")

url_nifty <- "https://www.nseindia.com/live_market/dynaContent/live_watch/fomwatchsymbol.jsp?key=NIFTY&Fut_Opt=Futures"
website_nifty_future_live<- read_html(url_nifty)

nifty_spot<- website_nifty_future_live %>%
   html_nodes("#tab26Content > table:nth-child(1)") %>%
   html_table(header = NA, trim = TRUE, fill = FALSE, dec = ".") %>%
   as.data.frame()

It is then of course quite easy to get the first row incl. headers, eg with

nifty_spot[1, ]
     Instrument Underlying Expiry.Date Option.Type Strike.Price Open.Price High.Price Low.Price Prev..Close Last.Price Volume Turnover.lacs.
1 Index Futures      NIFTY   28SEP2017           -            -  10,105.00  10,144.70 10,078.00   10,107.90  10,096.90 94,799    7,18,943.53
  Underlying..Value
1           10079.3

Hope it helps!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM