简体   繁体   English

根据列 x 的值对数据框进行子集化。 只想要R中的前两位

[英]Subset a data frame based on count of values of column x. Want only the top two in R

here is the data frame这是数据框

p <- c(1, 3, 45, 1, 1, 54, 6, 6, 2)
x <- c("a", "b", "a", "a", "b", "c", "a", "b", "b")
df <- data.frame(p, x)

I want to subset the data frame such that I get a new data frame with only the top two"x" based on the count of "x".我想对数据框进行子集化,以便根据“x”的计数得到一个只有前两个“x”的新数据框。

One of the simplest ways to achieve what you want to do is with the package data.table .实现您想要做的最简单的方法之一是使用 package data.table You can read more about it here .你可以在这里阅读更多关于它的信息。 Basically, it allows for fast and easy aggregation of your data.基本上,它允许快速轻松地聚合您的数据。

Please note that I modified your initial data by appending the elements 10 and c to p and x , respectively.请注意,我通过将元素10c分别附加到px来修改您的初始数据。 This way, you won't see a NA when filtering the top two observations.这样,您在过滤前两个观察值时不会看到NA

The idea is to sort your dataset and then operate the function .SD which is a convenient way for subsetting/filtering/extracting observations.这个想法是对您的数据集进行排序,然后操作.SD ,这是一种方便的子集/过滤/提取观察方法。

Please, see the code below.请看下面的代码。

library(data.table)

p <- c(1, 3, 45, 1, 1, 54, 6, 6, 2, 10)  
x <- c("a", "b", "a", "a", "b", "c", "a", "b", "b", "c") 
df <- data.table(p, x)

# Sort by the group x and then by p in descending order
setorder( df, x, -p )

# Extract the first two rows by group "x"
top_two <- df[ , .SD[ 1:2 ], by = x ]
top_two
#>    x  p
#> 1: a 45
#> 2: a  6
#> 3: b  6
#> 4: b  3
#> 5: c 54
#> 6: c 10

Created on 2021-02-16 by the reprex package (v1.0.0)代表 package (v1.0.0) 于 2021 年 2 月 16 日创建

Does this work for you?这对你有用吗?

Using dplyr:使用 dplyr:

library(dplyr)
df %>% 
  add_count(x) %>% 
  slice_max(n, n = 2)

   p x n
1  1 a 4
2  3 b 4
3 45 a 4
4  1 a 4
5  1 b 4
6  6 a 4
7  6 b 4
8  2 b 4

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何基于R中一行中的两个连续列值对数据帧进行子集 - how to subset a data frame based on two consecutive column values in a row in R 基于数据框 R 子集的一列中“分类值”的百分比 - Percentages of "categorical values" in one column based on subset of data frame R 基于列值的子集数据帧 - Subset data frame based on column values 在值列表之后,我想对 r 中的数据框进行子集化,其中行包含某列中的值 - Following a list of values, I want to subset a data frame in r with rows containing the values in a certain column 根据数据框中另一列的值对数据框进行子集 - Subset a data frame based on values of another column in data frame R。 我正在尝试将我的数据框子集几十年。 因此,我想通过使用列的值进行子集化 - R. I am trying to subset my data frame by decades. Therefore I want to subset by using values of a column R-如何基于数据帧中的列值来显示数据子集行 - R- how subset lines of data based on column values in a data frame R根据另一个数据帧的x获取数据子集 - R get subset of data based on another data frame's x 基于变量中前 N 个最频繁值的子集数据框 - Subset data frame based on top N most frequent values in variable 基于两个不同列中的两个条件的子集数据帧R - Subset data frame R based in two conditions in two different columns
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM