简体   繁体   中英

for loop using dates in url address

I have a URL that looks like this: http://nationalbank.kz/?docid=105&cmomdate=2018-01-03&switch=english

I would like to loop over all dates staring from 2015 and store data in a data frame. I am getting an error if I run the following:

StartDate <- "2017-07-01"
EndDate <- "2017-07-10"
dates <- seq(as.Date(StartDate, format="%Y-%m-%d"),
             as.Date(EndDate, format="%Y-%m-%d"), by='days')

ML = list()

for (date in dates) {
  url = paste0("http://nationalbank.kz/?docid=105&cmomdate=",
               as.Date(date, format="%Y-%m-%d", origin = "1960-10-01"),
               "&switch=english")
  p <- url %>%
    read_html() %>%
    html_nodes(xpath='//table[1]') %>%
    html_table(fill = T)
  dt = p[[11]]
  tdt = as.data.frame(dt)

  ML[[date]] = tdt
}

all = do.call(rbind, ML)
all

The error message is Error in rbind(deparse.level, ...) : numbers of columns of arguments do not match .

But when I run it for only 1 date, it seems to be working:

url <- "http://nationalbank.kz/?docid=105&cmomdate=20187-07-01&switch=english"

p <- url %>%
  read_html() %>%
  html_nodes(xpath='//table[1]') %>%
  html_table(fill = T)
dt = p[[11]]
tdt = t(dt)
tdt

ML = list()

for (i in 1:3) {
  ML[[i]] = tdt
}

all = do.call(rbind, ML)
all

The output is:

   [,1]               [,2]           [,3]       [,4]               
X1 "Type of security" "NIN"          "Maturity" "Type of placement"
X2 "Notes NBK"        "KZW1KD072398" "7 day"    "Auction"          
X1 "Type of security" "NIN"          "Maturity" "Type of placement"
X2 "Notes NBK"        "KZW1KD072398" "7 day"    "Auction"          
X1 "Type of security" "NIN"          "Maturity" "Type of placement"
X2 "Notes NBK"        "KZW1KD072398" "7 day"    "Auction"          
   [,5]                [,6]              [,7]             
X1 "Date of placement" "Settlement date" "Redemption date"
X2 "09.04.2018"        "09.04.2018"      "16.04.2018"     
X1 "Date of placement" "Settlement date" "Redemption date"
X2 "09.04.2018"        "09.04.2018"      "16.04.2018"     
X1 "Date of placement" "Settlement date" "Redemption date"
X2 "09.04.2018"        "09.04.2018"      "16.04.2018"     
   [,8]                         [,9]                      
X1 "Actual amount of placement" ""                        
X2 "339 999 999 929.33 tenge"   "3 405 587 268 (quantity)"
X1 "Actual amount of placement" ""                        
X2 "339 999 999 929.33 tenge"   "3 405 587 268 (quantity)"
X1 "Actual amount of placement" ""                        
X2 "339 999 999 929.33 tenge"   "3 405 587 268 (quantity)"
   [,10]                      [,11]                     
X1 "Demand"                   ""                        
X2 "366 198 211 200.00 tenge" "3 668 000 000 (quantity)"
X1 "Demand"                   ""                        
X2 "366 198 211 200.00 tenge" "3 668 000 000 (quantity)"
X1 "Demand"                   ""                        
X2 "366 198 211 200.00 tenge" "3 668 000 000 (quantity)"
   [,12]                     [,13]           [,14]           
X1 "Weighted-averaged price" "Cut price"     "Yield (coupon)"
X2 "99.8359 tenge"           "99.8359 tenge" "8.5707 %"      
X1 "Weighted-averaged price" "Cut price"     "Yield (coupon)"
X2 "99.8359 tenge"           "99.8359 tenge" "8.5707 %"      
X1 "Weighted-averaged price" "Cut price"     "Yield (coupon)"
X2 "99.8359 tenge"           "99.8359 tenge" "8.5707 %" 

What is wrong with my previous code?

it looks like the issue is that the web page is returning an inconsistently formatted page, so that when you're calling p[[11]], it's not returning the consistent information and in turn throwing errors when trying to rbind differently sized data frames. The code below highlights this issue with the inserted print() that displays the date and the variable length of the list() that is assigned to 'p'. The date that throws things out is '2008-04-04' -- the fix below simply checks if the list length is 14, and if so, add its to ML; the do.call to rbind then concatenates these as expected.

library(rvest)
StartDate <- "2017-07-01"
EndDate <- "2017-07-10"
dates <- seq(as.Date(StartDate, format="%Y-%m-%d"),
             as.Date(EndDate, format="%Y-%m-%d"), by='days')

ML = list()

date <-
for (date in dates) {
  url = paste0("http://nationalbank.kz/?docid=105&cmomdate=",
               as.Date(date, format="%Y-%m-%d", origin = "1960-10-01"),
               "&switch=english")
  p <- url %>%
    read_html() %>%
    html_nodes(xpath='//table[1]') %>%
    html_table(fill = T)

  print(paste(as.Date(date, format="%Y-%m-%d", origin = "1960-10-01"),length(p)))

  if(length(p) == 14) {
  dt = p[[11]]
  tdt = as.data.frame(dt)

  ML[[date]] = tdt
  }
}

all = do.call(rbind, ML)
all

Apparently, using dates in loop is not a good idea. Thus I made the following modifications:

  for (date in seq_along(dates)) {
    di = dates[date]
    url = paste0("http://nationalbank.kz/?docid=105&cmomdate=",
                 as.Date(di, format="%Y-%m-%d"),
                 "&switch=english")

Also, @Soren mentioned checking if length(p) == 14 . This really helped. But checking p length is less crucial as the page may not contain the table at all. Instead of checking length(p) I decided to check nrow(dt) == 14 . As in if the table has exactly 14 rows, then store the data to the list ML .

Happy to see any more robust solutions.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM