根据每个名称的观察次数过滤数据

Question

This table is a simplified version of the one I have.这张表是我所拥有的那张表的简化版本。 Now that I have multiple observations for each user, I want to include specific ones.现在我对每个用户都有多个观察结果，我想包括特定的观察结果。 The rule I want to apply is that if the observation for a specific user is just one observation, then include this one observation, else, include the 2nd observation only (as ordered by time for each session played).我要应用的规则是，如果特定用户的观察只是一个观察，则包括这一观察，否则，仅包括第二个观察（按每个会话播放的时间排序）。

name姓名	user_id用户身份	session_id session_id
e_1 e_1	111 111	101 101
e_1 e_1	111 111	102 102
e_1 e_1	111 111	103 103
e_2 e_2	112 112	104 104
e_2 e_2	112 112	105 105
e_3 e_3	113 113	106 106
e_3 e_3	113 113	107 107
e_4 e_4	114 114	108 108
e_5 e_5	115 115	109 109

So, I want to reach this output所以，我想达到这个输出

name姓名	user_id用户身份	session_id session_id
e_1 e_1	111 111	102 102
e_2 e_2	112 112	105 105
e_3 e_3	113 113	107 107
e_4 e_4	114 114	108 108
e_5 e_5	115 115	109 109

I tried running multiple codes that use dplyr, and use if_else.我尝试运行多个使用 dplyr 的代码，并使用 if_else。 I used the function count() to find the number of observations for each name, and then tried to filter accordingly我使用函数 count() 来查找每个名称的观察次数，然后尝试进行相应的过滤

Here's my try这是我的尝试

new_table<- 
old_table %>%
group_by(name) %>%
arrange(name) %>% 
if_else(count(name) %>% filter(n==1), filter (row_number()==1), filter (row_number()==2))

However it's not working.但是它不起作用。 Kindly guide me to adjust it for the right code.请指导我为正确的代码调整它。

Answer 1

This is not the most efficient, but should give you a solution.这不是最有效的，但应该给你一个解决方案。

Input Data:输入数据：

> df
  name user_id session_id
1  e_1     111        101
2  e_1     111        102
3  e_1     111        103
4  e_2     112        104
5  e_2     112        105
6  e_3     113        106
7  e_3     113        107
8  e_4     114        108
9  e_5     115        109

And then:接着：

rnam.df <- c()
for (n in unique(df$name)){
  new_df <- subset(df,df$name==n) # subset df by names
  ifelse(nrow(new_df)>1,Value <- 2, Value <- 1) # you only want first or second row whith same name
  rnam.df <- append(rnam.df,row.names(new_df[Value, ])) # append row names which contains the values found
}

Out:出去：

> rnam.df
[1] "2" "5" "7" "8" "9"

> df[rnam.df, ]
  name user_id session_id
2  e_1     111        102
5  e_2     112        105
7  e_3     113        107
8  e_4     114        108
9  e_5     115        109

EDIT编辑

3rd column should not be a problem since I did not work with.第三列应该不是问题，因为我没有使用。 I got values from previous post before you made the edit!在您进行编辑之前，我从以前的帖子中获得了价值！

Answer 2

A tidyverse option:一个tidyverse选项：

library(tidyverse)

tribble(
  ~name, ~user_id, ~session_id,
  "e_1", 111, 101,
  "e_1", 111, 102,
  "e_1", 111, 103,
  "e_2", 112, 104,
  "e_2", 112, 105,
  "e_3", 113, 106,
  "e_3", 113, 107,
  "e_4", 114, 108,
  "e_5", 115, 109
) |> 
  group_by(user_id) |> 
  filter(row_number() <= 2) |> 
  slice_tail(n = 1)
#> # A tibble: 5 × 3
#> # Groups:   user_id [5]
#>   name  user_id session_id
#>   <chr>   <dbl>      <dbl>
#> 1 e_1       111        102
#> 2 e_2       112        105
#> 3 e_3       113        107
#> 4 e_4       114        108
#> 5 e_5       115        109

^{Created on 2022-06-06 by the reprex package (v2.0.1)}^{由reprex 包于 2022-06-06 创建 (v2.0.1)}

根据每个名称的观察次数过滤数据

问题描述

2 个解决方案

解决方案1
1 2022-06-06 09:56:54

解决方案2
1 已采纳 2022-06-06 10:08:13

根据每个名称的观察次数过滤数据

问题描述

2 个解决方案

解决方案1 1 2022-06-06 09:56:54

解决方案2 1 已采纳 2022-06-06 10:08:13

解决方案1
1 2022-06-06 09:56:54

解决方案2
1 已采纳 2022-06-06 10:08:13