[英]R choose the first item from ordinal values in a data frame
Suppose I have a dataframe: 假设我有一个数据框:
df=data.frame(cat=c("b1","b2","b3","b2","b5","b1","b3"),
item=c("a1","a2","a3","a4","a1","a3","a4"),
status=c("ok","good","bad","excellent","ok","good","bad"))
And I need for each category b1-b5, choose only the top a (ranked by status from excellent to good to ok to bad) and corresponding status, and in case of tie take a random one. 而且我需要为每个类别b1-b5选择仅前一个(按从好到好到好到坏到从低到高的状态排列)和相应的状态,如果是平局,则随机选择一个。
So for b1 it'll take a3 good instead of a1 ok, for b3 it could take either a3 bad or a4 bad. 因此,对于b1,它需要a3好而不是a1好,对于b3,它可能需要a3坏或a4坏。 sample output: 样本输出:
cat item status
b1 a3 good
What's the best way to do this? 最好的方法是什么?
You could try: 您可以尝试:
df$status <- factor(df$status, levels=c("excellent", "good", "ok", "bad"))
library(dplyr)
df %>%
group_by(cat) %>%
arrange(status) %>%
filter(row_number()==1)
# cat item status
#1 b2 a4 excellent
#2 b1 a3 good
#3 b5 a1 ok
#4 b3 a3 bad
Or using data.table
或使用data.table
library(data.table)
setDT(df)[,.SD[order(status)][1], by=cat]
# cat item status
#1: b1 a3 good
#2: b2 a4 excellent
#3: b3 a3 bad
#4: b5 a1 ok
I noticed that you wanted to get a random sample
in cases of tie
我注意到您想要在tie
情况下random sample
setDT(df)[, if(length(status)>1 & length(unique(status))==1)
.SD[sample(1:.N,1)]
else .SD[order(status)][1]
, by=cat]
# cat item status
#1: b1 a3 good
#2: b2 a4 excellent
#3: b3 a3 bad
#4: b5 a1 ok
df[, if(length(status)>1 & length(unique(status))==1)
.SD[sample(1:.N,1)]
else .SD[order(status)][1] ,
by=cat]
# cat item status
#1: b1 a3 good
#2: b2 a4 excellent
#3: b3 a4 bad
#4: b5 a1 ok
First put status in correct order: 首先按照正确的顺序放置状态:
R>df$status <- ordered(df$status, c("bad", "ok", "good", "excellent"))
Then : 然后 :
R>by(df, df$cat, function(d) d[which.max(d$status), ])
df$cat: b1
cat item status
6 b1 a3 good
------------------------------------------------------------
df$cat: b2
cat item status
4 b2 a4 excellent
------------------------------------------------------------
df$cat: b3
cat item status
3 b3 a3 bad
------------------------------------------------------------
df$cat: b5
cat item status
5 b5 a1 ok
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.