I have a rather wide dataset to read in with over 1000 missing values at the top, but all the variable names follow the same pattern. Is there a way to use starts_with() to force certain variables to be parsed correctly?


mwe.csv <- data.frame(id        = c("a", "b"), #not where I actually get the data from
                      amount1   = c(NA, 20),
                      currency1 = c(NA, "USD")

mwe <- readr::read_csv("mwe.csv", guess_max = 1) #guess_max() for example purposes

I'd like to be able do

mwe<- read_csv("mwe.csv", guess.max = 1 
         col_types = cols(starts_with("amount") = "d",
                          starts_with("currency") = "c"))

> mwe
# A tibble: 2 x 3
  id    amount currency
  <chr>  <dbl> <chr>   
1 a         NA NA      
2 b         20 USD   

But I get the error "unexpected '=' in: read_csv". Any thoughts? I cannot hard code it because the number of columns will change regularly, but the pattern (amountN) will be constant. There will also be other columns that are not id or amount/currency. I would prefer not to increase the guess.max() option for speed purposes.

The answer is to cheat!

mwe             <- read_csv("mwe.csv", n_max = 0) # only need the col_names
cnames          <- attr(mwe, "spec") # grab the col_names
ctype           <- rep("?", ncol(mwe)) # create the col_parser abbr -- all guesses
currency        <- grepl("currency", names(cnames$col)) # which ones are currency? 
                # or use base::startsWith(names(cnames$col), "currency")
ctype[currency] <- "c" # do not guess on currency ones, use character
                       # repeat lines 4 & 5 as needed
mwe             <- read_csv("mwe.csv", col_types = paste(ctype, collapse = ""))


