[英]Mutate and case when issue - dplyr
I have the following data and I was to make a new column using mutate which details when colour = 'g' then take the level on the g row minus the level figure on the 'r' row.我有以下数据,我将使用 mutate 创建一个新列,当 color = 'g' 时的详细信息然后取 g 行上的级别减去 'r' 行上的级别数字。
Then likewise with type.然后同样用类型。 Where type = 1 then take the corresponding level minus the level on the type 2 row.
其中 type = 1 然后取相应的级别减去类型 2 行上的级别。
library(dplyr)
d <- tibble(
date = c("2018", "2018", "2018", "2019", "2019", "2019", "2020", "2020", "2020", "2020"),
colour = c("none","g", "r", "none","g", "r", "none", "none", "none", "none"),
type = c("type1", "none", "none", "type2", "none", "none", "none", "none", "none", "none"),
level= c(78, 99, 45, 67, 87, 78, 89, 87, 67, 76))
Just to be clear this is what I want the data to look like.需要明确的是,这就是我希望数据的样子。
So the data should look like this:所以数据应该是这样的:
d2 <- tibble(
date = c("2018", "2018", "2018", "2019", "2019", "2019", "2020", "2020", "2020", "2020"),
colour = c("none","g", "r", "none","g", "r", "none", "none", "none", "none"),
type = c("type1", "none", "none", "type2", "none", "none", "none", "none", "none", "none"),
level= c(78, 99, 45, 67, 87, 78, 89, 87, 67, 76),
color_gap = c("NULL", 44, "NULL", "NULL", 9, "NULL", "NULL", "NULL", "NULL", "NULL"),
type_gap = c(11, "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL"))
I started to use mutate and case when and got to the below.我开始使用 mutate 和 case when 并到达下面。 However, I'm stuck on the final calculation part.
但是,我被困在最后的计算部分。 How do I say I want to take the color g level - the color r level?
怎么说我要拿色g级——色级r?
d %>%
mutate(color_gap = case_when(color == "g" ~ level)%>%
mutate(type_gap = case_when(type== "type1" ~ level)%>%
) -> d2
Anyone know how to complete this?有谁知道如何完成这个?
Thanks谢谢
This subtracts the first r level from the first g level, second r level from second g level, etc. Same for type1 and type2.这从第一个 g 级别中减去第一个 r 级别,从第二个 g 级别中减去第二个 r 级别,等等。对于 type1 和 type2 相同。 This has no checks at all.
这根本没有检查。 It doesn't check whether there is a matching r for each g, whether they are in the expected order, whether they are in the same date-group, etc. It assumes the data is already perfectly formatted as expected, so be careful using this on real data.
它不检查每个 g 是否有匹配的 r,它们是否按预期顺序,它们是否在同一个日期组中等。它假设数据已经按预期完美格式化,所以要小心使用这是基于真实数据的。
d %>%
mutate(color_gap = replace(rep(NA, n()), colour == 'g',
level[colour == 'g'] - level[colour == 'r']),
type_gap = replace(rep(NA, n()), type == 'type1',
level[type == 'type1'] - level[type == 'type2']))
# # A tibble: 10 x 6
# date colour type level color_gap type_gap
# <chr> <chr> <chr> <dbl> <dbl> <dbl>
# 1 2018 none type1 78 NA 11
# 2 2018 g none 99 54 NA
# 3 2018 r none 45 NA NA
# 4 2019 none type2 67 NA NA
# 5 2019 g none 87 9 NA
# 6 2019 r none 78 NA NA
# 7 2020 none none 89 NA NA
# 8 2020 none none 87 NA NA
# 9 2020 none none 67 NA NA
# 10 2020 none none 76 NA NA
you could do this with group_by
and mutate.你可以用
group_by
和 mutate 来做到这一点。
I assumed that there is only 1 row per date
that would satisfy each condition.我假设每个
date
只有 1 行可以满足每个条件。
d %>%
mutate(color_gap = case_when(colour == "g" ~ level)) %>%
mutate(type_gap = case_when(type== "type1" ~ level)) %>%
group_by(date) %>%
mutate(diff = max(color_gap,na.rm=T)-max(type_gap, na.rm=T))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.