[英]Subsetting Dataset in R based on the number of observations that meet criteria. [R]
I have a dataset that looks like this: 我有一个如下所示的数据集:
Employee Month CSAT
ABROWN February 4
ABROWN January 5
ABROWN March 3
ABROWN March 5
JSMITH February 5
JSMITH January 3
JSMITH February 5
JSMITH March 5
JSMITH February 5
JSMITH January 4
Except of course much larger. 除了当然要大得多。 I'm trying to run analysis on Employee by month, but I don't want to include employees for whom there aren't enough observations in a certain month.
我试图按月对员工进行分析,但我不想包括在某个月内没有足够观察的员工。
For instance, lets say in this case, I only want to keep observation where an Employee has at least two CSAT scores in the same month. 例如,假设在这种情况下,我只想观察员工在同一个月内至少有两个CSAT分数。 In this case we would filter out observations 1,2, and 8.
在这种情况下,我们将过滤掉观察1,2和8。
I've messed with this for too long. 我已经搞砸了太久了。 And am at a loss.
我不知所措。
We can do this with data.table
. 我们可以使用
data.table
来做到这data.table
。 Convert the 'data.frame' to 'data.table' ( setDT(df1)
), grouped by 'Employee', 'Month', if
the number of observations ( .N
) is greater than 1, Subset the Data.table ( .SD
) if
观察数( .N
)大于1,则将'data.frame'转换为'data.table'( setDT(df1)
),按'Employee','Month'分组,Subset the Data.table( .SD
)
library(data.table)
setDT(df1)[, if(.N >1) .SD, by = .(Employee, Month)]
# Employee Month CSAT
#1: ABROWN March 3
#2: ABROWN March 5
#3: JSMITH February 5
#4: JSMITH February 5
#5: JSMITH February 5
#6: JSMITH January 3
#7: JSMITH January 4
Or using dplyr
with similar logic in filter
after grouping by 'Employee', 'Month' 或者在“员工”,“月份”分组后在
filter
使用具有类似逻辑的dplyr
library(dplyr)
df1 %>%
group_by(Employee, Month) %>%
filter(n() >1)
Or using base R
with ave
to create a logical index filter the rows of 'df1'. 或者使用带有
ave
base R
来创建逻辑索引过滤'df1'的行。
df1[with(df1, ave(CSAT, Employee, Month, FUN=length)>1),]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.