简体   繁体   中英

Convert any number of vectors into a dataframe whilst preserving data types and using vector names as column names in R

Is there a simple function (preferably user-written, or found in base R) that takes any number of vectors, and produces a dataframe whist retaining the vectors' data types, and using the vector variables' names as the column names?

An example

Inputs (vectors)

> var_a # character
[1] "a" "b" "c"

> var_b # numeric
[1] 1 3 4

> var_c # factor
[1] red   black black
Levels: black red

Desired output

  var_a var_b var_c
1     a     1   red
2     b     3 black
3     c     4 black

where the classes are

sapply(my_dataframe, class)

#      var_a       var_b       var_c 
#"character"   "numeric"    "factor"

Attempt 1 - Using cbind

Using cbind will produce a matrix (with a single data type) - so this method does not maintain the vectors' original data types (it changes all columns to character)

first_method <- cbind(var_a, var_b, var_c)

Attempt 2 - Using do.call (similar to here )

In this case the data types are lost and so are the names of the vector variables

ls <- list(var_a, var_b, var_c)
second_method <- data.frame(do.call(cbind, ls))
second_method %>% sapply(class)
#       X1       X2       X3 
# "factor" "factor" "factor"

Attempt 3 - using data.frame

This method gets close (it retains the vector names as column names in the dataframe), but unfortunately it converts character data types into factors

third_method <- data.frame(var_a, var_b, var_c)
third_method %>% sapply(class)
#    var_a     var_b     var_c 
# "factor" "numeric"  "factor" 

Attempt 4 - Manually declaring each column of the dataframe, AND its name, AND its datatype

This returns the desired output, however, it is not eloquent, instead taking a lot of manual coding for large numbers of vectors, and is prone to user error because the user must specify the datatype manually for each column

fourth_method <- data.frame("var_a"=as.character(var_a), "var_b"=as.numeric(var_b), "var_c"=as.factor(var_c), stringsAsFactors = FALSE)
fourth_method %>% sapply(class)

#      var_a       var_b       var_c 
#"character"   "numeric"    "factor" 

Note: this , this , and this solution are unsuitable as they result in loss of data type

Also note: The vectors in this question are not named vectors as referred to in this question

At this point, I am running low on ideas and am unsure what to try next?

This works fine with data.frame . You just need to add the argument, stringsAsFactors=FALSE .

df = data.frame(var_a, var_b, var_c, stringsAsFactors = FALSE)
sapply(df, class)
      var_a       var_b       var_c 
"character"   "numeric"    "factor" 

We can use tibble to preserve the column types

library(tibble)
tibble(var_a, var_b, var_c)
# A tibble: 3 x 3
#  var_a var_b var_c
#  <chr> <dbl> <fct>
#1 a         1 red  
#2 b         3 black
#3 c         4 black

NOTE: tibble can be used with tidyverse operations, but if we really require data.frame , converting it to data.frame would still preserve the data types

tibble(var_a, var_b, var_c) %>%
    as.data.frame %>%
    str
#'data.frame':  3 obs. of  3 variables:
# $ var_a: chr  "a" "b" "c"
# $ var_b: num  1 3 4
# $ var_c: Factor w/ 2 levels "black","red": 2 1 1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM