R中的数据网页抓取

Question

我正在使用R和rvest从www.nseindia.com抓取Web数据。 第一次能够下载数据，但是之后出现以下错误消息...

UseMethod（“ xml_find_all”）中的错误：没有适用于“ xml_find_all”的适用方法应用于“字符”类的对象

我正在尝试获得指数期货的第一行

我的代码如下

    library("rvest")

    website_nifty_future_live<- read_html("https://www.nseindia.com/live_market/dynaContent/live_watch/fomwatchsymbol.jsp?key=NIFTY&Fut_Opt=Futures")

    nifty_spot<- website_nifty_future_live %>%
      + html_nodes(".alt:nth-child(2) td:nth-child(13)") %>%
       + html_text()
    nifty_spot<-as.numeric(gsub(",","",nifty_spot))

Answer 1

该错误很可能是由于代码开头的“ +”号引起的-删除它们时我没有收到此错误。

我建议使用以下代码将整个表读取为data.frame：

library("rvest")

url_nifty <- "https://www.nseindia.com/live_market/dynaContent/live_watch/fomwatchsymbol.jsp?key=NIFTY&Fut_Opt=Futures"
website_nifty_future_live<- read_html(url_nifty)

nifty_spot<- website_nifty_future_live %>%
   html_nodes("#tab26Content > table:nth-child(1)") %>%
   html_table(header = NA, trim = TRUE, fill = FALSE, dec = ".") %>%
   as.data.frame()

那么当然很容易获得第一行的含税。 标头，例如

nifty_spot[1, ]
     Instrument Underlying Expiry.Date Option.Type Strike.Price Open.Price High.Price Low.Price Prev..Close Last.Price Volume Turnover.lacs.
1 Index Futures      NIFTY   28SEP2017           -            -  10,105.00  10,144.70 10,078.00   10,107.90  10,096.90 94,799    7,18,943.53
  Underlying..Value
1           10079.3

希望能帮助到你！

R中的数据网页抓取

问题描述

1 个解决方案

解决方案1
0 2017-09-13 14:03:38

R中的数据网页抓取

问题描述

1 个解决方案

解决方案1 0 2017-09-13 14:03:38

解决方案1
0 2017-09-13 14:03:38