简体   繁体   中英

Removing dataframes from list of dataframes based on condition using for loop in R

Consider the following data named as mat . My objective is to count the unique values of v1 for each id and store them in variable n . And then, I want to remove the data frame from the list if n <= 1 and when n >= 2 .

id v1 v2 
1  2  3
1  2  5
2  9  8
2  4  5
3  7  8
3  1  5

Here what I've tried:

dat  <- list()
dat2 <- list()
for (i in seq_along(unique(mat$id))){
  dat[[i]] <- data.frame(subset(mat,mat$id==unique(mat$id)[i])) 
  dat[[i]]$n <- length(unique(dat[[i]]$v1))
  if(dat[[i]]$n >= 2){
    dat2[[i]] <- dat[[i]]
  }
}

Any help is much appreciated!

personally, I would go for the dplyr approach

library(dplyr)
df <- data.frame(id = c(1,1,2,2,3,3),
                 v1 = c(2,2,9,4,7,1),
                 v2 = c(3,5,8,5,8,5))
df_n <- df %>%
    group_by(id) %>% #groups the data by a variable
    summarise(n = length(unique(v1))) %>% #does a computation for each group
    filter(n >=2) #subsets based on condition
df_n
## A tibble: 2 x 2
#     id     n
#  <dbl> <int>
#1     2     2
#2     3     2

Now, if you want to remove any data from df that remains in df_n , you can use an anti_join

anti_join(df, df_n, by = "id") #
#  id v1 v2
#1  1  2  3
#2  1  2  5

If you want to keep the id s that remain in df_n , you can use the inverse, a semi_join

semi_join(df, df_n, by = "id")
#  id v1 v2
#1  2  9  8
#2  2  4  5
#3  3  7  8
#4  3  1  5

Edit

If you want to add a new column, replace summarize with mutate . The difference between these functions is that mutate will return the data frame with the new evaluated expressions, and summarize will return a data frame with rows equal to the number of grouping combinations.

After seeing your comment, the below code should get you most of the way there, you probably don't need semi/anti_join for this case. Use filter to split the data frames how you like.

library(dplyr)
df <- data.frame(id = c(1,1,2,2,3,3),
                 v1 = c(2,2,9,4,7,1),
                 v2 = c(3,5,8,5,8,5)) %>%
  group_by(id) %>%
  mutate(n = length(unique(v1))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM