简体   繁体   English

如何在R中使用dplyr进行条件选择?

[英]How can I make conditional selections using dplyr in R?

I have the following situation. 我有以下情况。 Given the table 给定表

df <- data.frame(ID = c(1, 2, 2, 3, 3, 4),
             type = c("MC", "MC", "MK", "MC", "MK", "MC"),
             value1 = c(512, 261, 4523, 1004, 1221, 2556),
             value2 = c(726, 4000, 280, 998, 113, 6789))

I am trying to find a way to implement the following logic: If for an ID, both types (MC and MK) occur, use value1 from MK and value2 from MC. 我试图找到一种实现以下逻辑的方法:如果对于ID,两种类型(MC和MK)都出现,请使用MK中的value1和MC中的value2。 Otherwise (only the type MC occurs), use MC. 否则(仅出现MC类型),请使用MC。

Hence, the final result is supposed to be: 因此,最终结果应该是:

data.frame(ID = c(1, 2, 3, 4),
             type = c("MC", "MC", "MC", "MC"),
             value1 = c(512, 4523, 1221, 2556),
             value2 = c(726, 4000, 998, 6789))

Assuming the type MK is dropped after extracting the value1. 假设在提取value1之后删除了MK类型。

Another version with dplyr dplyr另一个版本

library(dplyr)

df %>%
  group_by(ID) %>%
  mutate(value1 = ifelse(any(type == "MK"), value1[type=="MK"],value1[type=="MC"]), 
         value2 = value2[type == "MC"]) %>%
  filter(type == "MC")

#     ID type  value1 value2
#  <dbl> <fct>  <dbl>  <dbl>
#1     1 MC       512    726
#2     2 MC      4523   4000
#3     3 MC      1221    998
#4     4 MC      2556   6789

Here, for value1 we check value in "MK" if it is present or take corresponding "MC" value instead and for value2 by default we take "MC" value and keep only rows with type "MC". 在这里,对于value1我们检查“ MK”中的值(如果存在)或取其相应的“ MC”值;对于value2 ,默认情况下,我们采用“ MC”值,并且仅保留type “ MC”的行。 This is assuming every group ( ID ) would have a "MC" type row. 假设每个组( ID )都有一个“ MC” type行。

data.table solution data.table解决方案

setDT(df1)[,{x=.SD;if(all(c("MC","MK") %in% type)){x$value1[] = last(value1)};first(x)},by=ID]

result: 结果:

#  ID type value1 value2
#1  1   MC    512    726
#2  2   MC   4523   4000
#3  3   MC   1221    998
#4  4   MC   2556   6789

dplyr : dplyr

df1 %>% group_by(ID) %>% do(.,(function(x){if(all(c("MC","MK") %in% x$type)){x$value1[] = x$value1[x$type=="MK"]};x[1,]})(.))

# A tibble: 4 x 4
# Groups:   ID [4]
#     ID type  value1 value2
#  <dbl> <fct>  <dbl>  <dbl>
#1     1 MC       512    726
#2     2 MC      4523   4000
#3     3 MC      1221    998
#4     4 MC      2556   6789

For efficiency I would definitely prefer @Andre Elrico' answer but here is a dplyr option. 为了提高效率,我绝对希望使用@Andre Elrico的答案,但这是一个dplyr选项。 Try: 尝试:

df <- data.frame(ID = c(1, 2, 2, 3, 3, 4),
                 type = c("MC", "MC", "MK", "MC", "MK", "MC"),
                 value1 = c(512, 261, 4523, 1004, 1221, 2556),
                 value2 = c(726, 4000, 280, 998, 113, 6789)) 
library(dplyr)
df %>%
  reshape(., idvar = "ID", timevar = "type", direction = "wide") %>%
  group_by(ID) %>%
  mutate(value1 = ifelse(is.na(value1.MK), value1.MC, value1.MK),
         value2 = ifelse(is.na(value2.MC), value2.MK, value2.MC),
         type = "MC") %>%
  select(ID, type, value1, value2)
# output
# A tibble: 4 x 4
# Groups:   ID [4]
     ID  type value1 value2
  <dbl> <chr>  <dbl>  <dbl>
1     1    MC    512    726
2     2    MC   4523   4000
3     3    MC   1221    998
4     4    MC   2556   6789

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM