Consider the following data named as mat
. My objective is to count the unique values of v1
for each id
and store them in variable n
. And then, I want to remove the data frame from the list if n <= 1
and when n >= 2
.
id v1 v2
1 2 3
1 2 5
2 9 8
2 4 5
3 7 8
3 1 5
Here what I've tried:
dat <- list()
dat2 <- list()
for (i in seq_along(unique(mat$id))){
dat[[i]] <- data.frame(subset(mat,mat$id==unique(mat$id)[i]))
dat[[i]]$n <- length(unique(dat[[i]]$v1))
if(dat[[i]]$n >= 2){
dat2[[i]] <- dat[[i]]
}
}
Any help is much appreciated!
personally, I would go for the dplyr
approach
library(dplyr)
df <- data.frame(id = c(1,1,2,2,3,3),
v1 = c(2,2,9,4,7,1),
v2 = c(3,5,8,5,8,5))
df_n <- df %>%
group_by(id) %>% #groups the data by a variable
summarise(n = length(unique(v1))) %>% #does a computation for each group
filter(n >=2) #subsets based on condition
df_n
## A tibble: 2 x 2
# id n
# <dbl> <int>
#1 2 2
#2 3 2
Now, if you want to remove any data from df
that remains in df_n
, you can use an anti_join
anti_join(df, df_n, by = "id") #
# id v1 v2
#1 1 2 3
#2 1 2 5
If you want to keep the id
s that remain in df_n
, you can use the inverse, a semi_join
semi_join(df, df_n, by = "id")
# id v1 v2
#1 2 9 8
#2 2 4 5
#3 3 7 8
#4 3 1 5
If you want to add a new column, replace summarize
with mutate
. The difference between these functions is that mutate
will return the data frame with the new evaluated expressions, and summarize
will return a data frame with rows equal to the number of grouping combinations.
After seeing your comment, the below code should get you most of the way there, you probably don't need semi/anti_join
for this case. Use filter
to split the data frames how you like.
library(dplyr)
df <- data.frame(id = c(1,1,2,2,3,3),
v1 = c(2,2,9,4,7,1),
v2 = c(3,5,8,5,8,5)) %>%
group_by(id) %>%
mutate(n = length(unique(v1))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.