[英]Mutating new column based on groups
Is there is a way to group rows together based on a common column value ( id
), then, mutate a new column with a new id ( new.id
) based on whether within each group the values are above and/or below 1000?有没有一种方法可以根据公共列值 ( id
) 将行分组在一起,然后根据每个组内的值是否高于和/或低于 1000,使用新的 id ( new.id
) 改变新列? Such as:如:
< 1000 = "low/low"
(where all values in that groups are below 1000) < 1000 = "low/low"
(该组中的所有值都低于 1000)< 1000 and > 1000 = "low/high"
(where some are below and above 1000) < 1000 and > 1000 = "low/high"
(其中一些低于和高于 1000)> 1000 = "high/high"
(where all values are above 1000) > 1000 = "high/high"
(所有值都在 1000 以上)Data数据
#Example
id values
1 a 200
2 a 300
3 b 100
4 b 2000
5 b 3000
6 c 4000
7 c 2000
8 c 3000
9 d 2400
10 d 2000
11 d 400
#dataframe:
structure(list(id = c("a", "a", "b", "b", "b", "c", "c", "c",
"d", "d", "d"), values = c(200, 300, 100, 2000, 3000, 4000, 2000,
3000, 2400, 2000, 400)), class = "data.frame", row.names = c(NA,
-11L))
Desired output期望输出
id values new.id
1 a 200 low/low
2 a 300 low/low
3 b 100 low/high
4 b 2000 low/high
5 b 3000 low/high
6 c 4000 high/high
7 c 2000 high/high
8 c 3000 high/high
9 d 2400 low/high
10 d 2000 low/high
11 d 400 low/high
A dplyr
solution would be great, but open to any others! dplyr
解决方案会很棒,但对任何其他人开放!
df %>%
group_by(id) %>%
mutate(new.id = case_when(
all(values < 1000) ~ "low/low",
all(values > 1000) ~ "high/high",
TRUE ~ "low/high"
))
Alternatively, you can use the recode function from dplyr .或者,您可以使用dplyr 中的重新编码功能。
df %>% group_by(id) %>%
mutate(
new.id = dplyr::recode(
sum(values > 1000) / length(values),
`0` = "low/low",
`1` = "high/high",
.default = "low/high"
)
)
In case you like to keep a total count as well如果您还想保留总数
df %>% group_by(id) %>%
add_tally() %>%
mutate(new.id = dplyr::recode(
sum(values > 1000) / n,
`0` = "low/low",
`1` = "high/high",
.default = "low/high"
))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.