简体   繁体   English

R从数据框中的序数中选择第一项

[英]R choose the first item from ordinal values in a data frame

Suppose I have a dataframe: 假设我有一个数据框:

df=data.frame(cat=c("b1","b2","b3","b2","b5","b1","b3"),
              item=c("a1","a2","a3","a4","a1","a3","a4"),
              status=c("ok","good","bad","excellent","ok","good","bad"))

And I need for each category b1-b5, choose only the top a (ranked by status from excellent to good to ok to bad) and corresponding status, and in case of tie take a random one. 而且我需要为每个类别b1-b5选择仅前一个(按从好到好到好到坏到从低到高的状态排列)和相应的状态,如果是平局,则随机选择一个。

So for b1 it'll take a3 good instead of a1 ok, for b3 it could take either a3 bad or a4 bad. 因此,对于b1,它需要a3好而不是a1好,对于b3,它可能需要a3坏或a4坏。 sample output: 样本输出:

  cat item    status
  b1   a3     good

What's the best way to do this? 最好的方法是什么?

You could try: 您可以尝试:

  df$status <- factor(df$status, levels=c("excellent", "good", "ok", "bad"))
  library(dplyr)
  df %>%
  group_by(cat) %>% 
  arrange(status)  %>% 
  filter(row_number()==1)
  #  cat item    status
  #1  b2   a4 excellent
  #2  b1   a3      good
  #3  b5   a1        ok
  #4  b3   a3       bad

Or using data.table 或使用data.table

 library(data.table)
 setDT(df)[,.SD[order(status)][1], by=cat]
 #   cat item    status
 #1:  b1   a3      good
 #2:  b2   a4 excellent
 #3:  b3   a3       bad
 #4:  b5   a1        ok

Update 更新资料

I noticed that you wanted to get a random sample in cases of tie 我注意到您想要在tie情况下random sample

 setDT(df)[, if(length(status)>1 & length(unique(status))==1) 
               .SD[sample(1:.N,1)]
                 else .SD[order(status)][1]
                                     , by=cat]
  # cat item    status
 #1:  b1   a3      good
 #2:  b2   a4 excellent
 #3:  b3   a3       bad
 #4:  b5   a1        ok


  df[, if(length(status)>1 & length(unique(status))==1)
             .SD[sample(1:.N,1)]
               else .SD[order(status)][1] , 
                                    by=cat]
#   cat item    status
#1:  b1   a3      good
#2:  b2   a4 excellent
#3:  b3   a4       bad
#4:  b5   a1        ok

First put status in correct order: 首先按照正确的顺序放置状态:

R>df$status <- ordered(df$status, c("bad", "ok", "good", "excellent"))

Then : 然后 :

R>by(df, df$cat, function(d) d[which.max(d$status), ])
df$cat: b1
  cat item status
6  b1   a3   good
------------------------------------------------------------ 
df$cat: b2
  cat item    status
4  b2   a4 excellent
------------------------------------------------------------ 
df$cat: b3
  cat item status
3  b3   a3    bad
------------------------------------------------------------ 
df$cat: b5
  cat item status
5  b5   a1     ok

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 数据帧R中值的最大一阶导数 - Maximum first derivative in for values in a data frame R R 中是否有 function 可以让我创建一个包含第一个数据帧重复值的新数据帧? - Is there a function in R that will let me create a new data frame that contains the duplicated values from the first data frame? 在 R 中,如果站点和日期在两个数据帧中匹配,则从第一个数据帧中提取行值 - In R, if site and date match in two data frames, pull row values from first data frame R识别数据框中的第一个值,并通过从新列的数据框中的所有值中添加/减去该值来创建新变量 - R identifying first value in data-frame and creating new variable by adding/subtracting this from all values in data-frame in new column 如何在R中的数据帧中找到一列中出现字符串最长的时间以及另一列中对应的第一个和最后一个值? - How to find the longest occurrence of a string in a column and corresponding first and last values from another column in a data frame in R? R:根据输入值与其他列的接近程度从数据框中的第一列返回值 - R: return value from first column in data frame based on closeness of inputted values to toher columns 从 r 中的序数数据创建虚拟变量 - creating dummy variable from ordinal data in r R从带有向量的数据框中提取值 - R Extract values from data frame with vectors R将数据框中的值分配给名称 - R assigning values from a data frame to colnames 从 R 中的数据框中提取十进制值 - Extracting decimal values from a data frame in R
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM