簡體   English   中英

子集 dataframe 基於 R 中列內因子水平的分層偏好

[英]subset dataframe based on hierarchical preference of factor levels within column in R

我有一個 dataframe ,我想根據列中因子水平的分層偏好對其進行子集化。 通過以下示例,我想展示每個級別的“ID”我想 select 只有一個“方法”。 具體來說,如果可能保留 CACL,如果此級別不存在 CACL,則為“KCL”子集,如果不存在,則為“H2O”子集。

ID<-c(1,1,1,2,2,3)
method<-c("CACL","KCL","H2O","H2O","KCL","H2O")
df1<-data.frame(ID,method)

  ID  method
1  1    CACL
2  1     KCL
3  1     H2O
4  2     H2O
5  2     KCL
6  3     H2O

ID<-c(1,2,3)
method<-c("CACL","KCL","H2O")
df2<-data.frame(ID,method)

  ID  method
1  1    CACL
2  2     KCL
3  3     H2O

我通過在一個級別中選擇一個最小數字來完成類似的子集化,但我無法適應它。 我想知道我是否也應該在這里使用 ifelse ?

#if present, choose rows containing "number" 2 instead of 1 (this column contained only the two numbers 1 and 2)

library(dplyr)
new<-df %>%
group_by(col1,col2,col3) %>%
summarize(number = ifelse(any(number > 1), min(number[number>1]),1))
dfnew<-merge(new,df,by=c("colxyz","number"),all.x=T)

您可以將ordermatch一起使用,然后簡單地!duplicated

df1 <- df1[order(match(df1$method, c("CACL","KCL","H2O"))),]
df1[!duplicated(df1$ID),]
#  ID method
#1  1   CACL
#5  2    KCL
#6  3    H2O

#Variant not changing df1
i <- order(match(df1$method, c("CACL","KCL","H2O")))
df1[i[!duplicated(df1$ID[i])],]

使用dplyr的選項:

df1 %>% 
  mutate(preference = match(method,  c("CACL","KCL","H2O"))) %>% 
  group_by(ID) %>% 
  filter(preference == min(preference)) %>% 
  select(-preference)

# A tibble: 3 x 2
# Groups:   ID [3]
     ID method
  <dbl> <fct> 
1     1 CACL  
2     2 KCL   
3     3 H2O 

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM