[英]Selecting top N values within a group in a column using R
I need to select top two values for each group[yearmonth] value from the following data frame in R. I have already sorted the data by count and yearmonth.How can I achieve that in following data? 我需要从R中的以下数据框中为每个组[yearmonth]值选择前两个值。我已经按照count和yearmonth对数据进行了排序。如何在以下数据中实现这一点?
yearmonth name count
1 201310 Dovas 5
2 201310 Indulgd 2
3 201310 Justina 1
4 201310 Jolita 1
5 201311 Shahrukh Sheikh 1
6 201311 Dovas 29
7 201311 Justina 13
8 201311 Lina 8
9 201312 sUPERED 7
10 201312 John Hansen 7
11 201312 Lina D. 6
12 201312 joanna1st 5
Or using data.table
( mydf
from @jazzurro's post). 或使用data.table
( mydf
从@ jazzurro的帖子)。 Some options are 有些选择
library(data.table)
setDT(mydf)[order(yearmonth,-count), .SD[1:2], by=yearmonth]
Or 要么
setDT(mydf)[mydf[order(yearmonth, -count), .I[1:2], by=yearmonth]$V1,]
Or 要么
setorder(setkey(setDT(mydf), yearmonth), yearmonth, -count)[
,.SD[1:2], by=yearmonth]
# yearmonth name count
#1: 201310 Dovas 5
#2: 201310 Indulgd 2
#3: 201311 Dovas 29
#4: 201311 Justina 13
#5: 201312 sUPERED 7
#6: 201312 John Hansen 7
Here is one way: 这是一种方式:
library(dplyr)
mydf %>%
group_by(yearmonth) %>%
arrange(desc(count)) %>%
slice(1:2)
# yearmonth name count
#1 201310 Dovas 5
#2 201310 Indulgd 2
#3 201311 Dovas 29
#4 201311 Justina 13
#5 201312 sUPERED 7
#6 201312 John Hansen 7
DATA 数据
mydf <- data.frame(yearmonth = rep(c("201310", "201311", "201312"), each = 4),
name = c("Dovas", "Indulgd", "Justina", "Jolita", "Shahrukh Sheikh",
"Dovas", "Justina", "Lina", "sUPERED", "John Hansen",
"Lina D.", "joanna1st"),
count = c(5,2,1,1,1,29,13,8,7,7,6,5),
stringsAsFactors = FALSE)
Using base R you could do something like: 使用base R你可以做类似的事情:
# sort the data, skip if already done
df <- df[order(df$yearmonth, df$count, decreasing = TRUE),]
Then, to get the top two elements: 然后,获得前两个元素:
df[ave(df$count, df$yearmonth, FUN = seq_along) <= 2, ]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.