[英]Subset a data frame based on count of values of column x. Want only the top two in R
here is the data frame这是数据框
p <- c(1, 3, 45, 1, 1, 54, 6, 6, 2)
x <- c("a", "b", "a", "a", "b", "c", "a", "b", "b")
df <- data.frame(p, x)
I want to subset the data frame such that I get a new data frame with only the top two"x" based on the count of "x".我想对数据框进行子集化,以便根据“x”的计数得到一个只有前两个“x”的新数据框。
One of the simplest ways to achieve what you want to do is with the package data.table .实现您想要做的最简单的方法之一是使用 package data.table 。 You can read more about it here .
你可以在这里阅读更多关于它的信息。 Basically, it allows for fast and easy aggregation of your data.
基本上,它允许快速轻松地聚合您的数据。
Please note that I modified your initial data by appending the elements 10
and c
to p
and x
, respectively.请注意,我通过将元素
10
和c
分别附加到p
和x
来修改您的初始数据。 This way, you won't see a NA
when filtering the top two observations.这样,您在过滤前两个观察值时不会看到
NA
。
The idea is to sort your dataset and then operate the function .SD
which is a convenient way for subsetting/filtering/extracting observations.这个想法是对您的数据集进行排序,然后操作
.SD
,这是一种方便的子集/过滤/提取观察方法。
Please, see the code below.请看下面的代码。
library(data.table)
p <- c(1, 3, 45, 1, 1, 54, 6, 6, 2, 10)
x <- c("a", "b", "a", "a", "b", "c", "a", "b", "b", "c")
df <- data.table(p, x)
# Sort by the group x and then by p in descending order
setorder( df, x, -p )
# Extract the first two rows by group "x"
top_two <- df[ , .SD[ 1:2 ], by = x ]
top_two
#> x p
#> 1: a 45
#> 2: a 6
#> 3: b 6
#> 4: b 3
#> 5: c 54
#> 6: c 10
Created on 2021-02-16 by the reprex package (v1.0.0)由代表 package (v1.0.0) 于 2021 年 2 月 16 日创建
Does this work for you?这对你有用吗?
Using dplyr:使用 dplyr:
library(dplyr)
df %>%
add_count(x) %>%
slice_max(n, n = 2)
p x n
1 1 a 4
2 3 b 4
3 45 a 4
4 1 a 4
5 1 b 4
6 6 a 4
7 6 b 4
8 2 b 4
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.