简体   繁体   English

删除值频率小于 R 中的 x 的行

[英]Delete rows with value frequencies lesser than x in R

I got a data frame in R like the following:我在 R 中得到了一个数据框,如下所示:

V1 V2 V3
1  2  3
1  43 54
2  34 53
3  34 51
3  43 42
...

And I want to delete all rows which value of V1 has a frequency lower then 2. So in my example the row with V1 = 2 should be deleted, because the value "2" only appears once in the column ("1" and "3" appear twice each).我想删除所有 V1 值频率低于 2 的行。因此在我的示例中,应删除 V1 = 2 的行,因为值“2”仅在列中出现一次(“1”和“ 3”各出现两次)。

I tired to add a extra column with the frequency of V1 in it to delete all rows where the frequency is > 1 but with the following I only get NAs in the extra column.我厌倦了添加一个频率为 V1 的额外列,以删除频率 > 1 的所有行,但使用以下内容,我只在额外列中获得 NA。

data$Frequency <- table(data$V1)[data$V1]

Thanks谢谢

You can try this:你可以试试这个:

library(dplyr)
df %>% group_by(V1) %>% filter(n() > 1)

You can also consider using data.table.您也可以考虑使用 data.table。 We first count the occurence of each value in V1, then filter on those occurences being more than 1. Finally, we remove our count-column as we no longer need it.我们首先计算 V1 中每个值的出现次数,然后过滤那些出现次数大于 1 的次数。最后,我们删除了我们不再需要的计数列。

library(data.table)

setDT(dat)
dat2 <- dat[,n:=.N,V1][n>1,,][,n:=NULL]

Or even quicker, thanks to RichardScriven:甚至更快,感谢 RichardScriven:

dat[, .I[.N >= 2], by = V1]
> dat2
   V1 V2 V3
1:  1  2  3
2:  1 43 54
3:  3 34 51
4:  3 43 42

With this you do not need to load a library有了这个你不需要加载库

res<-data.frame(V1=c(1,1,2,3,3,3),V2=rnorm(6),V3=rnorm(6))
res[res$V1%in%names(table(res$V1)>=2)[table(res$V1)>=2],]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM