基于组变异新列

Question

Is there is a way to group rows together based on a common column value ( id ), then, mutate a new column with a new id ( new.id ) based on whether within each group the values are above and/or below 1000?有没有一种方法可以根据公共列值 ( id ) 将行分组在一起，然后根据每个组内的值是否高于和/或低于 1000，使用新的 id ( new.id ) 改变新列？ Such as:如：

< 1000 = "low/low" (where all values in that groups are below 1000) < 1000 = "low/low" （该组中的所有值都低于 1000）
< 1000 and > 1000 = "low/high" (where some are below and above 1000) < 1000 and > 1000 = "low/high" （其中一些低于和高于 1000）
> 1000 = "high/high" (where all values are above 1000) > 1000 = "high/high" （所有值都在 1000 以上）

Data数据

#Example
  id values
1   a    200
2   a    300
3   b    100
4   b   2000
5   b   3000
6   c   4000
7   c   2000
8   c   3000
9   d   2400
10  d   2000
11  d    400

#dataframe:
structure(list(id = c("a", "a", "b", "b", "b", "c", "c", "c", 
"d", "d", "d"), values = c(200, 300, 100, 2000, 3000, 4000, 2000, 
3000, 2400, 2000, 400)), class = "data.frame", row.names = c(NA, 
-11L))

Desired output期望输出

   id values    new.id
1   a    200   low/low
2   a    300   low/low
3   b    100  low/high
4   b   2000  low/high
5   b   3000  low/high
6   c   4000 high/high
7   c   2000 high/high
8   c   3000 high/high
9   d   2400  low/high
10  d   2000  low/high
11  d    400  low/high

A dplyr solution would be great, but open to any others! dplyr解决方案会很棒，但对任何其他人开放！

Answer 1

df %>% 
  group_by(id) %>%
  mutate(new.id = case_when(
    all(values < 1000) ~ "low/low",
    all(values > 1000) ~ "high/high",
    TRUE ~ "low/high"
  ))

Answer 2

Alternatively, you can use the recode function from dplyr .或者，您可以使用dplyr 中的重新编码功能。


df %>% group_by(id) %>%
  mutate(
    new.id = dplyr::recode(
      sum(values > 1000) / length(values),
      `0` = "low/low",
      `1` = "high/high",
      .default = "low/high"
    )
  )

In case you like to keep a total count as well如果您还想保留总数


df %>% group_by(id) %>%
  add_tally() %>%
  mutate(new.id = dplyr::recode(
    sum(values > 1000) / n,
    `0` = "low/low",
    `1` = "high/high",
    .default = "low/high"
  ))

基于组变异新列

问题描述

2 个解决方案

解决方案1
0 2020-11-10 02:27:14

解决方案2
0 2020-11-10 04:46:53

基于组变异新列

问题描述

2 个解决方案

解决方案1 0 2020-11-10 02:27:14

解决方案2 0 2020-11-10 04:46:53

解决方案1
0 2020-11-10 02:27:14

解决方案2
0 2020-11-10 04:46:53