[英]How can I make conditional selections using dplyr in R?
I have the following situation. 我有以下情况。 Given the table
给定表
df <- data.frame(ID = c(1, 2, 2, 3, 3, 4),
type = c("MC", "MC", "MK", "MC", "MK", "MC"),
value1 = c(512, 261, 4523, 1004, 1221, 2556),
value2 = c(726, 4000, 280, 998, 113, 6789))
I am trying to find a way to implement the following logic: If for an ID, both types (MC and MK) occur, use value1 from MK and value2 from MC. 我试图找到一种实现以下逻辑的方法:如果对于ID,两种类型(MC和MK)都出现,请使用MK中的value1和MC中的value2。 Otherwise (only the type MC occurs), use MC.
否则(仅出现MC类型),请使用MC。
Hence, the final result is supposed to be: 因此,最终结果应该是:
data.frame(ID = c(1, 2, 3, 4),
type = c("MC", "MC", "MC", "MC"),
value1 = c(512, 4523, 1221, 2556),
value2 = c(726, 4000, 998, 6789))
Assuming the type MK is dropped after extracting the value1. 假设在提取value1之后删除了MK类型。
Another version with dplyr
dplyr
另一个版本
library(dplyr)
df %>%
group_by(ID) %>%
mutate(value1 = ifelse(any(type == "MK"), value1[type=="MK"],value1[type=="MC"]),
value2 = value2[type == "MC"]) %>%
filter(type == "MC")
# ID type value1 value2
# <dbl> <fct> <dbl> <dbl>
#1 1 MC 512 726
#2 2 MC 4523 4000
#3 3 MC 1221 998
#4 4 MC 2556 6789
Here, for value1
we check value in "MK" if it is present or take corresponding "MC" value instead and for value2
by default we take "MC" value and keep only rows with type
"MC". 在这里,对于
value1
我们检查“ MK”中的值(如果存在)或取其相应的“ MC”值;对于value2
,默认情况下,我们采用“ MC”值,并且仅保留type
“ MC”的行。 This is assuming every group ( ID
) would have a "MC" type
row. 假设每个组(
ID
)都有一个“ MC” type
行。
data.table
solution data.table
解决方案
setDT(df1)[,{x=.SD;if(all(c("MC","MK") %in% type)){x$value1[] = last(value1)};first(x)},by=ID]
result: 结果:
# ID type value1 value2
#1 1 MC 512 726
#2 2 MC 4523 4000
#3 3 MC 1221 998
#4 4 MC 2556 6789
dplyr
: dplyr
:
df1 %>% group_by(ID) %>% do(.,(function(x){if(all(c("MC","MK") %in% x$type)){x$value1[] = x$value1[x$type=="MK"]};x[1,]})(.))
# A tibble: 4 x 4
# Groups: ID [4]
# ID type value1 value2
# <dbl> <fct> <dbl> <dbl>
#1 1 MC 512 726
#2 2 MC 4523 4000
#3 3 MC 1221 998
#4 4 MC 2556 6789
For efficiency I would definitely prefer @Andre Elrico' answer but here is a dplyr
option. 为了提高效率,我绝对希望使用@Andre Elrico的答案,但这是一个
dplyr
选项。 Try: 尝试:
df <- data.frame(ID = c(1, 2, 2, 3, 3, 4),
type = c("MC", "MC", "MK", "MC", "MK", "MC"),
value1 = c(512, 261, 4523, 1004, 1221, 2556),
value2 = c(726, 4000, 280, 998, 113, 6789))
library(dplyr)
df %>%
reshape(., idvar = "ID", timevar = "type", direction = "wide") %>%
group_by(ID) %>%
mutate(value1 = ifelse(is.na(value1.MK), value1.MC, value1.MK),
value2 = ifelse(is.na(value2.MC), value2.MK, value2.MC),
type = "MC") %>%
select(ID, type, value1, value2)
# output
# A tibble: 4 x 4
# Groups: ID [4]
ID type value1 value2
<dbl> <chr> <dbl> <dbl>
1 1 MC 512 726
2 2 MC 4523 4000
3 3 MC 1221 998
4 4 MC 2556 6789
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.