简体   繁体   中英

R assign variable types to large data.frame from vector

I have a wide data.frame that is all character vectors ( df1 ). I have a separate vector ( vec1 ) that contains the column classes I'd like to assign to each of the columns in df1 .

If I was using read.csv() , I'd use the colClasses argument and set it equal to vec1 , but there doesn't appear to be a similar option for an existing data.frame .

Any suggestions for a fast way to do this besides a loop?

I don't know if it will be of help but I have run into the same need many times and I have created a function in case it helps:

reclass <- function(df, vec){
  df[] <- Map(function(x, f){
    #switch below shows the accepted values in the vector
    #you can modify it and/or add more
    f <- switch(f,
                as.is  = 'force',
                factor = 'as.factor',
                num    = 'as.numeric',
                char   = 'as.character')
    #takes the name of the function and fetches the function
    f <- get(f)
    #apply the function
    f(x)
  },
      df,
      vec)
df
} 

It uses Map to pass in a vector of classes to the data.frame . Each element corresponds to the class of the column. The length of both the dataframe and the vector need to be the same.

I am using switch as well to make the corresponding classes shorter to type. Use as.is to keep the class the same, the rest are self explanatory I think.

Small example:

df1 <- data.frame(1:10, letters[1:10], runif(50))
> str(df1)
'data.frame':   50 obs. of  3 variables:
 $ X1.10        : int  1 2 3 4 5 6 7 8 9 10 ...
 $ letters.1.10.: Factor w/ 10 levels "a","b","c","d",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ runif.50.    : num  0.0969 0.1957 0.8283 0.1768 0.9821 ...

And after the function:

df1 <- reclass(df1, c('num','as.is','char'))
> str(df1)
'data.frame':   50 obs. of  3 variables:
 $ X1.10        : num [1:50] 1 2 3 4 5 6 7 8 9 10 ...
 $ letters.1.10.: Factor w/ 10 levels "a","b","c","d",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ runif.50.    : chr [1:50] "0.0968757788650692" "0.19566105119884" "0.828283685725182" "0.176784737734124" ...

I guess Map internally is a loop but it is written in C so it should be fast enough.

May be you could try this function that makes the same work.

reclass <- function (df, vec_types) {
        for (i in 1:ncol(df)) {
          type <- vec_types[i]
          class(df[ , i]) <- type
          }
        return(df)
        }

and this is an example of vec_types (vector of types):

vec_types <- c('character', rep('integer', 3), rep('character', 2))

you can test the function (reclass) whith this table (df):

table <- data.frame(matrix(sample(1:10,30, replace = T), nrow = 5, ncol = 6))
str(table)  # original column types

# apply the function
table <- reclass(table, vec_types)
str(table)  # new column types

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM