简体   繁体   中英

R: generate a new column based on groups and conditions

I have a data frame with 4 columns (part of it shown below).

The first column shows groups ordered by numbers: 1, 2, ....

I want to generate a new column "value4". For each group, if the group size is bigger than 2 (>=3), and all the values in column "value1" are bigger than 2 (>2) or smaller than -2 (< -2), then the median of the corresponding values in column "value3" are calculated and put in column "value4" for each row of this group. Otherwise, the values from "value2" is taken to column "value4".

g   value1   value2  value3
1     1.1      8       1
1     1.2      8       1
1     1.3      9       1
2     3        10      5
2     4        11      5
2     5        0       4
2     6        1       6
3     -3       2       5
3     -4       3       10
3     -5       4       0
4     -3       1       0
4     -4       1       0

The output will be:

g   value1   value2  value3  value4
1     1.1      8       1       8  # for group "1", all the values in "value1" are <2, so the values from column "value2" are taken
1     1.2      8       1       8
1     1.3      9       1       9
2     3        10      5       5  # for group "2", all the values in "value1" are >2, median of numbers 5,5,4,6 from column "value3" is calculated  
2     4        11      5       5
2     5        0       4       5
2     6        1       6       5
3     -3       2       5       5  # for group "3", all the values in "value1" are < -2, median of numbers 5,10,0 from column "value3" is calculated      
3     -4       3       10      5
3     -5       4       0       5
4     -3       1       0       1  # group size less than 3, so the values from column "value2" are taken
4     -4       1       0       1

I think I can use aggregate(), but I don't know how to integrate the conditions. I appreciate your time and help.

Based on the condition, we can use a if/else condition utilizing the groupsize ( n() ) and if all value1 less than -2 or greater than 2,then get the median of 'value3' or else return 'value2'

library(dplyr)
df1 %>%       
   group_by(g) %>%
   mutate(value4 = if(n() > 2 & (all(value1 > 2)| all(value1 < -2))) median(value3) 
       else value2)
# A tibble: 12 x 5
# Groups:   g [4]
#       g value1 value2 value3 value4
#   <int>  <dbl>  <int>  <int>  <dbl>
# 1     1    1.1      8      1      8
# 2     1    1.2      8      1      8
# 3     1    1.3      9      1      9
# 4     2    3       10      5      5
# 5     2    4       11      5      5
# 6     2    5        0      4      5
# 7     2    6        1      6      5
# 8     3   -3        2      5      5
# 9     3   -4        3     10      5
#10     3   -5        4      0      5
#11     4   -3        1      0      1
#12     4   -4        1      0      1

data

df1 <- structure(list(g = c(1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 
4L, 4L), value1 = c(1.1, 1.2, 1.3, 3, 4, 5, 6, -3, -4, -5, -3, 
-4), value2 = c(8L, 8L, 9L, 10L, 11L, 0L, 1L, 2L, 3L, 4L, 1L, 
1L), value3 = c(1L, 1L, 1L, 5L, 5L, 4L, 6L, 5L, 10L, 0L, 0L, 
0L)), class = "data.frame", row.names = c(NA, -12L))

You can use the package data.table as follows:

library(data.table)
setDT(df)[, value4 := if(.N > 2 & (all(value1 > 2) | all(value1 < -2))) median(value3) else value2, g]

This is an ideal situation for case_when() .*

You would like value4 to be calculated based on the following condition:

If Group size > 2 and the absolute value of all value1 in a group > 2 => take the median of value3 . Otherwise use value2

library(dplyr)
df %>%
  group_by(g) %>%
  mutate(value4 = case_when( (n() > 2) & (all(abs(value1) > 2)) ~ median(value3), 
                            T ~ value2)

*One would think we could use if_else() here because there is only one condition but for some reason, it was failing when using all() in the condition. I think it was returning multiple values? Unclear, but maybe someone else could explain.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM