I'm using the R package fastDummies to create dummy variables for categorical variables. How can I select the names of the newly created columns with the dummy variables instead of getting the whole dataframe?
Example:
# Create random datframe
vec1<-sample(1:100, 50, replace=TRUE) # continuous variable
vec2<-sample(1:100, 50, replace=TRUE) # continuous variable
vec3<-sample(1:3, 50, replace=TRUE)
vec4<-sample(1:3, 50, replace=TRUE)
mydata<-data.frame(vec1, vec2, vec3, vec4)
mydata$vec3<-factor(mydata$vec3) # categorical variable with 3 levels
mydata$vec4<-factor(mydata$vec4) # categorical variable with 3 levels
# Create dummy variables
library(fastDummies)
dummys<-dummy_columns(mydata, select_columns=c("vec3", "vec4"), remove_first_dummy = TRUE)
# Now dummys will contain this:
head(dummys)
# output:
# vec1 vec2 vec3 vec4 vec3_2 vec3_3 vec4_2 vec4_3
#1 40 59 3 2 0 1 1 0
#2 55 3 1 2 0 0 1 0
#3 26 55 1 3 0 0 0 1
#4 38 29 1 2 0 0 1 0
#5 33 54 2 1 1 0 0 0
#6 45 26 2 2 1 0 1 0
Now I want to select the newly created dummy variable columns without specifying them manually:
# NOT BY checking
colnames(mydata)
mydata$vec3_2
# etc...
But just by selecting the newly created dummy variable columns created by dummy_columns with whatever dataset used, since dummy_columns itself gives the whole dataset but I just want the names of the newly created dummy variable columns (so in this example: vec3_2, vec3_3, vec4_2, and vec4_3).
Anyone knows how to do this? Or even using another package/function to do this?
Well, a very simple use, is to do:
attach(mydata)
and then, you can simply use the vec3_2 for example, but, firstly, you could even did:
vec3<-factor(vec3)
vec4<-factor(vec4)
Without the necessity to use the symbol $ , actually it's quite simple.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.