r/ readr

I have a rather wide dataset to read in with over 1000 missing values at the top, but all the variable names follow the same pattern. Is there a way to use starts_with() to force certain variables to be parsed correctly?

MWE:

library(tidyverse)
library(readr)
mwe.csv <- data.frame(id        = c("a", "b"), #not where I actually get the data from
                      amount1   = c(NA, 20),
                      currency1 = c(NA, "USD")
)

mwe <- readr::read_csv("mwe.csv", guess_max = 1) #guess_max() for example purposes

I'd like to be able do

mwe<- read_csv("mwe.csv", guess.max = 1 
         col_types = cols(starts_with("amount") = "d",
                          starts_with("currency") = "c"))
)

> mwe
# A tibble: 2 x 3
  id    amount currency
  <chr>  <dbl> <chr>   
1 a         NA NA      
2 b         20 USD   

But I get the error "unexpected '=' in: read_csv". Any thoughts? I cannot hard code it because the number of columns will change regularly, but the pattern (amountN) will be constant. There will also be other columns that are not id or amount/currency. I would prefer not to increase the guess.max() option for speed purposes.

The answer is to cheat!

mwe             <- read_csv("mwe.csv", n_max = 0) # only need the col_names
cnames          <- attr(mwe, "spec") # grab the col_names
ctype           <- rep("?", ncol(mwe)) # create the col_parser abbr -- all guesses
currency        <- grepl("currency", names(cnames$col)) # which ones are currency? 
                # or use base::startsWith(names(cnames$col), "currency")
ctype[currency] <- "c" # do not guess on currency ones, use character
                       # repeat lines 4 & 5 as needed
mwe             <- read_csv("mwe.csv", col_types = paste(ctype, collapse = ""))

暂无
暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Use select_helpers with dplyr::coalesce Override column types when importing data using readr::read_csv() when there are many columns Could not find function “OlsonNames” when using read_csv with readr package readr::read_csv use cols_only with spliced list How to tell readr::read_csv to guess double column correctly Using read_csv with path to a file (readr's package) In readr()/read_csv, how to import data with all columns as character The difference between readr::read_csv() and read_csv() Escaping strings with readr::read_csv Ignore trailing delimiters in readr::read_csv
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM