I have a data frame that I want to subset it several times and store it in different variable names. Let's say my data frame looks something like this:
set.seed(123)
x <- rnorm(5)
y <- rnorm(5)
z <- rnorm(5)
f1 <- gl(2,1, labels = c("good", "bad"), length =5)
f2 <- gl(3,1, labels = c("red", "green", "yellow"), length = 5)
f3 <- gl(5,1, labels = c("foo", "bar", "foobar", "foofoo", "barbar"))
df <- data.frame(x,y,z,f1,f2,f3)
> df
x y z f1 f2 f3
1 -0.56047565 1.7150650 1.2240818 good red foo
2 -0.23017749 0.4609162 0.3598138 bad green bar
3 1.55870831 -1.2650612 0.4007715 good yellow foobar
4 0.07050839 -0.6868529 0.1106827 bad red foofoo
5 0.12928774 -0.4456620 -0.5558411 good green barbar
What I want to do is to create three new data frames by subsetting df and store them to different variable names. I know how to do that individually:
df_f1 <- df[,c(-5,-6)]
> df_f1
x y z f1
1 -0.56047565 1.7150650 1.2240818 good
2 -0.23017749 0.4609162 0.3598138 bad
3 1.55870831 -1.2650612 0.4007715 good
4 0.07050839 -0.6868529 0.1106827 bad
5 0.12928774 -0.4456620 -0.5558411 good
df_f2 <- df[,c(-4,-6)]
> df_f2
x y z f2
1 -0.56047565 1.7150650 1.2240818 red
2 -0.23017749 0.4609162 0.3598138 green
3 1.55870831 -1.2650612 0.4007715 yellow
4 0.07050839 -0.6868529 0.1106827 red
5 0.12928774 -0.4456620 -0.5558411 green
df_f3 <- df[,c(-4,-5)]
> df_f3
x y z f3
1 -0.56047565 1.7150650 1.2240818 foo
2 -0.23017749 0.4609162 0.3598138 bar
3 1.55870831 -1.2650612 0.4007715 foobar
4 0.07050839 -0.6868529 0.1106827 foofoo
5 0.12928774 -0.4456620 -0.5558411 barbar
However, is there a way to do it programmatically? Maybe using a for loop or lapply? My problem is that I don't know how can I assign the data frames I need to different variable names such as df_f1, df_f2 and df_f3 automatically without manually typing them one by one. What I mean is, is there a way to automatically generate variable names so that I can store data frames on them using loop or lapply?
I will apply this concept to a bigger data set and manually typing each variable names is quite tedious.
Thanks and have a nice day to all!
list2env(setNames(lapply(df[-(1:3)],cbind,df[1:3]),paste("df",1:3,sep="_f")),.GlobalEnv)
Breakdown:
First create a list that you need that has all the dataframes.
A=lapply(df[-(1:3)],cbind,df[1:3])
This takes all the other columns appart from 1:3, and then cbinds each one of the columns with df[1:3]
. This gives me a list A that hass all the dataframes I need. Now Give every dataframe in the list A name:
B=setNames(A,paste("df",1:3,sep="_f"))
You can play with paste
to see how it combines two things together. After that. We will list each element of the list, which is technically a dataframe to our global environment.
list2env(B,.GlobalEnv)
This seems to work, using lapply
:
keep<-3
split_id<-(keep+1):length(df)
df_list<- lapply(split_id, function(x){
df[,c(1:3,x)]
})
df_list
[[1]]
x y z f1
1 -0.56047565 1.7150650 1.2240818 good
2 -0.23017749 0.4609162 0.3598138 bad
3 1.55870831 -1.2650612 0.4007715 good
4 0.07050839 -0.6868529 0.1106827 bad
5 0.12928774 -0.4456620 -0.5558411 good
[[2]]
x y z f2
1 -0.56047565 1.7150650 1.2240818 red
2 -0.23017749 0.4609162 0.3598138 green
3 1.55870831 -1.2650612 0.4007715 yellow
4 0.07050839 -0.6868529 0.1106827 red
5 0.12928774 -0.4456620 -0.5558411 green
[[3]]
x y z f3
1 -0.56047565 1.7150650 1.2240818 foo
2 -0.23017749 0.4609162 0.3598138 bar
3 1.55870831 -1.2650612 0.4007715 foobar
4 0.07050839 -0.6868529 0.1106827 foofoo
5 0.12928774 -0.4456620 -0.5558411 barbar
Did you mean something like this?
dependent_col = c("f1", "f2", "f3")
df_l <- lapply(dependent_col, function(x) df[!(colnames(df) %in% dependent_col) | colnames(df) == x])
names(df_l) <- paste("df", dependent_col, sep="_")
df_l
You can access individual dataframe using df_l$df_f1
, df_l$df_f2
etc...
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.