简体   繁体   English

子集 dataframe 基于 R 中列内因子水平的分层偏好

[英]subset dataframe based on hierarchical preference of factor levels within column in R

I have a dataframe which I would like to subset based on hierarchical preference of factor levels within a column.我有一个 dataframe ,我想根据列中因子水平的分层偏好对其进行子集化。 With following example I want to show, that per level of "ID" I want to select only one "method".通过以下示例,我想展示每个级别的“ID”我想 select 只有一个“方法”。 Specifically, if possible keeping CACL, if CACL doesn't exist for this level, then subset for "KCL" and if that doesn't exist, then subset for "H2O".具体来说,如果可能保留 CACL,如果此级别不存在 CACL,则为“KCL”子集,如果不存在,则为“H2O”子集。

ID<-c(1,1,1,2,2,3)
method<-c("CACL","KCL","H2O","H2O","KCL","H2O")
df1<-data.frame(ID,method)

  ID  method
1  1    CACL
2  1     KCL
3  1     H2O
4  2     H2O
5  2     KCL
6  3     H2O

ID<-c(1,2,3)
method<-c("CACL","KCL","H2O")
df2<-data.frame(ID,method)

  ID  method
1  1    CACL
2  2     KCL
3  3     H2O

I have done something similar subsetting by selecting a minimum number within a level, but am not able to adapt it.我通过在一个级别中选择一个最小数字来完成类似的子集化,但我无法适应它。 Am wondering whether I should use ifelse here too?我想知道我是否也应该在这里使用 ifelse ?

#if present, choose rows containing "number" 2 instead of 1 (this column contained only the two numbers 1 and 2)

library(dplyr)
new<-df %>%
group_by(col1,col2,col3) %>%
summarize(number = ifelse(any(number > 1), min(number[number>1]),1))
dfnew<-merge(new,df,by=c("colxyz","number"),all.x=T)

You can use order with match and then simply !duplicated :您可以将ordermatch一起使用,然后简单地!duplicated

df1 <- df1[order(match(df1$method, c("CACL","KCL","H2O"))),]
df1[!duplicated(df1$ID),]
#  ID method
#1  1   CACL
#5  2    KCL
#6  3    H2O

#Variant not changing df1
i <- order(match(df1$method, c("CACL","KCL","H2O")))
df1[i[!duplicated(df1$ID[i])],]

An option using dplyr :使用dplyr的选项:

df1 %>% 
  mutate(preference = match(method,  c("CACL","KCL","H2O"))) %>% 
  group_by(ID) %>% 
  filter(preference == min(preference)) %>% 
  select(-preference)

# A tibble: 3 x 2
# Groups:   ID [3]
     ID method
  <dbl> <fct> 
1     1 CACL  
2     2 KCL   
3     3 H2O 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM