简体   繁体   中英

Subset a data frame and store to different variables in R using loop or lapply

I have a data frame that I want to subset it several times and store it in different variable names. Let's say my data frame looks something like this:

set.seed(123)
x <- rnorm(5)
y <- rnorm(5)
z <- rnorm(5)

f1 <- gl(2,1, labels = c("good", "bad"), length =5)
f2 <- gl(3,1, labels = c("red", "green", "yellow"), length = 5)
f3 <- gl(5,1, labels = c("foo", "bar", "foobar", "foofoo", "barbar"))

df <- data.frame(x,y,z,f1,f2,f3)    
> df

            x          y          z   f1     f2     f3
1 -0.56047565  1.7150650  1.2240818 good    red    foo
2 -0.23017749  0.4609162  0.3598138  bad  green    bar
3  1.55870831 -1.2650612  0.4007715 good yellow foobar
4  0.07050839 -0.6868529  0.1106827  bad    red foofoo
5  0.12928774 -0.4456620 -0.5558411 good  green barbar

What I want to do is to create three new data frames by subsetting df and store them to different variable names. I know how to do that individually:

df_f1 <- df[,c(-5,-6)]

> df_f1
            x          y          z   f1
1 -0.56047565  1.7150650  1.2240818 good
2 -0.23017749  0.4609162  0.3598138  bad
3  1.55870831 -1.2650612  0.4007715 good
4  0.07050839 -0.6868529  0.1106827  bad
5  0.12928774 -0.4456620 -0.5558411 good

df_f2 <- df[,c(-4,-6)]

> df_f2
            x          y          z     f2
1 -0.56047565  1.7150650  1.2240818    red
2 -0.23017749  0.4609162  0.3598138  green
3  1.55870831 -1.2650612  0.4007715 yellow
4  0.07050839 -0.6868529  0.1106827    red
5  0.12928774 -0.4456620 -0.5558411  green

df_f3 <- df[,c(-4,-5)]
> df_f3
            x          y          z     f3
1 -0.56047565  1.7150650  1.2240818    foo
2 -0.23017749  0.4609162  0.3598138    bar
3  1.55870831 -1.2650612  0.4007715 foobar
4  0.07050839 -0.6868529  0.1106827 foofoo
5  0.12928774 -0.4456620 -0.5558411 barbar

However, is there a way to do it programmatically? Maybe using a for loop or lapply? My problem is that I don't know how can I assign the data frames I need to different variable names such as df_f1, df_f2 and df_f3 automatically without manually typing them one by one. What I mean is, is there a way to automatically generate variable names so that I can store data frames on them using loop or lapply?

I will apply this concept to a bigger data set and manually typing each variable names is quite tedious.

Thanks and have a nice day to all!

list2env(setNames(lapply(df[-(1:3)],cbind,df[1:3]),paste("df",1:3,sep="_f")),.GlobalEnv)

Breakdown:

First create a list that you need that has all the dataframes.

  A=lapply(df[-(1:3)],cbind,df[1:3])

This takes all the other columns appart from 1:3, and then cbinds each one of the columns with df[1:3] . This gives me a list A that hass all the dataframes I need. Now Give every dataframe in the list A name:

  B=setNames(A,paste("df",1:3,sep="_f"))

You can play with paste to see how it combines two things together. After that. We will list each element of the list, which is technically a dataframe to our global environment.

 list2env(B,.GlobalEnv)

This seems to work, using lapply :

keep<-3
split_id<-(keep+1):length(df)
df_list<- lapply(split_id, function(x){
  df[,c(1:3,x)]
})

df_list
[[1]]
            x          y          z   f1
1 -0.56047565  1.7150650  1.2240818 good
2 -0.23017749  0.4609162  0.3598138  bad
3  1.55870831 -1.2650612  0.4007715 good
4  0.07050839 -0.6868529  0.1106827  bad
5  0.12928774 -0.4456620 -0.5558411 good

[[2]]
            x          y          z     f2
1 -0.56047565  1.7150650  1.2240818    red
2 -0.23017749  0.4609162  0.3598138  green
3  1.55870831 -1.2650612  0.4007715 yellow
4  0.07050839 -0.6868529  0.1106827    red
5  0.12928774 -0.4456620 -0.5558411  green

[[3]]
            x          y          z     f3
1 -0.56047565  1.7150650  1.2240818    foo
2 -0.23017749  0.4609162  0.3598138    bar
3  1.55870831 -1.2650612  0.4007715 foobar
4  0.07050839 -0.6868529  0.1106827 foofoo
5  0.12928774 -0.4456620 -0.5558411 barbar

Did you mean something like this?

dependent_col = c("f1", "f2", "f3")
df_l <- lapply(dependent_col, function(x) df[!(colnames(df) %in% dependent_col) | colnames(df) == x])
names(df_l) <- paste("df", dependent_col, sep="_")
df_l

You can access individual dataframe using df_l$df_f1 , df_l$df_f2 etc...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM