[英]Filter data according to number of observations for each name
This table is a simplified version of the one I have.这张表是我所拥有的那张表的简化版本。 Now that I have multiple observations for each user, I want to include specific ones.
现在我对每个用户都有多个观察结果,我想包括特定的观察结果。 The rule I want to apply is that if the observation for a specific user is just one observation, then include this one observation, else, include the 2nd observation only (as ordered by time for each session played).
我要应用的规则是,如果特定用户的观察只是一个观察,则包括这一观察,否则,仅包括第二个观察(按每个会话播放的时间排序)。
name![]() |
user_id![]() |
session_id ![]() |
---|---|---|
e_1 ![]() |
111 ![]() |
101 ![]() |
e_1 ![]() |
111 ![]() |
102 ![]() |
e_1 ![]() |
111 ![]() |
103 ![]() |
e_2 ![]() |
112 ![]() |
104 ![]() |
e_2 ![]() |
112 ![]() |
105 ![]() |
e_3 ![]() |
113 ![]() |
106 ![]() |
e_3 ![]() |
113 ![]() |
107 ![]() |
e_4 ![]() |
114 ![]() |
108 ![]() |
e_5 ![]() |
115 ![]() |
109 ![]() |
So, I want to reach this output所以,我想达到这个输出
name![]() |
user_id![]() |
session_id ![]() |
---|---|---|
e_1 ![]() |
111 ![]() |
102 ![]() |
e_2 ![]() |
112 ![]() |
105 ![]() |
e_3 ![]() |
113 ![]() |
107 ![]() |
e_4 ![]() |
114 ![]() |
108 ![]() |
e_5 ![]() |
115 ![]() |
109 ![]() |
I tried running multiple codes that use dplyr, and use if_else.我尝试运行多个使用 dplyr 的代码,并使用 if_else。 I used the function count() to find the number of observations for each name, and then tried to filter accordingly
我使用函数 count() 来查找每个名称的观察次数,然后尝试进行相应的过滤
Here's my try这是我的尝试
new_table<-
old_table %>%
group_by(name) %>%
arrange(name) %>%
if_else(count(name) %>% filter(n==1), filter (row_number()==1), filter (row_number()==2))
However it's not working.但是它不起作用。 Kindly guide me to adjust it for the right code.
请指导我为正确的代码调整它。
This is not the most efficient, but should give you a solution.这不是最有效的,但应该给你一个解决方案。
Input Data:输入数据:
> df
name user_id session_id
1 e_1 111 101
2 e_1 111 102
3 e_1 111 103
4 e_2 112 104
5 e_2 112 105
6 e_3 113 106
7 e_3 113 107
8 e_4 114 108
9 e_5 115 109
And then:接着:
rnam.df <- c()
for (n in unique(df$name)){
new_df <- subset(df,df$name==n) # subset df by names
ifelse(nrow(new_df)>1,Value <- 2, Value <- 1) # you only want first or second row whith same name
rnam.df <- append(rnam.df,row.names(new_df[Value, ])) # append row names which contains the values found
}
Out:出去:
> rnam.df
[1] "2" "5" "7" "8" "9"
> df[rnam.df, ]
name user_id session_id
2 e_1 111 102
5 e_2 112 105
7 e_3 113 107
8 e_4 114 108
9 e_5 115 109
EDIT编辑
3rd column should not be a problem since I did not work with.第三列应该不是问题,因为我没有使用。 I got values from previous post before you made the edit!
在您进行编辑之前,我从以前的帖子中获得了价值!
A tidyverse option:一个tidyverse选项:
library(tidyverse)
tribble(
~name, ~user_id, ~session_id,
"e_1", 111, 101,
"e_1", 111, 102,
"e_1", 111, 103,
"e_2", 112, 104,
"e_2", 112, 105,
"e_3", 113, 106,
"e_3", 113, 107,
"e_4", 114, 108,
"e_5", 115, 109
) |>
group_by(user_id) |>
filter(row_number() <= 2) |>
slice_tail(n = 1)
#> # A tibble: 5 × 3
#> # Groups: user_id [5]
#> name user_id session_id
#> <chr> <dbl> <dbl>
#> 1 e_1 111 102
#> 2 e_2 112 105
#> 3 e_3 113 107
#> 4 e_4 114 108
#> 5 e_5 115 109
Created on 2022-06-06 by the reprex package (v2.0.1)由reprex 包于 2022-06-06 创建 (v2.0.1)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.