简体   繁体   English

如何计算组内二元观察的数量?

[英]How to count number of binary observations within a group?

I am working with the following data:我正在处理以下数据:

Year  Month      Day   X      Binary    Color
2018  January    1     4.5        1     Red
2018  January    4     3.2        0     Red
2018  January    11    1.1        0     Blue
2018  February   7     5.4        1     Blue
2018  February   15    1.5        0     Red
2019  January    3     8.6        1     Red
2019  January    22    9.1        1     Blue
2019  January    23    5.5        1     Red
2019  February   5     6.9        0     Red
2019  February   10    1.8        0     Red

I am looking to create a new column that counts the number of times the number of instances that the binary variable is equal to 1 for a given month:我正在寻找一个新列来计算给定月份二进制变量等于 1 的实例数的次数:

Year  Month      Day   X      Binary    Color   Binary Count 
2018  January    1     4.5        1     Red        1
2018  January    4     3.2        0     Red        1
2018  January    11    1.1        0     Blue       1
2018  February   7     5.4        1     Blue       1
2018  February   15    1.5        0     Red        1
2019  January    3     8.6        1     Red        3
2019  January    22    9.1        1     Blue       3
2019  January    23    5.5        1     Red        3
2019  February   5     6.9        0     Red        0
2019  February   10    1.8        0     Red        0

I would also like to add a column which indicates the highest observation if there are more than 1 observation where Binary equals 1 and the color equal to red.我还想添加一列,如果有多个观察值,其中二进制等于 1 颜色等于红色,则表示最高观察值。

Year  Month      Day   X      Binary    Color   Binary Count   HighestRed
2018  January    1     4.5        1     Red        1              0
2018  January    4     3.2        0     Red        1              0
2018  January    11    1.1        0     Blue       1              0
2018  February   7     5.4        1     Blue       1              0
2018  February   15    1.5        0     Red        1              0
2019  January    3     8.6        1     Red        3              1
2019  January    22    9.1        1     Blue       3              0
2019  January    23    5.5        1     Red        3              0
2019  February   5     6.9        0     Red        0              0
2019  February   10    1.8        0     Red        0              0

Thanks in advance!提前致谢!

You can do the following:您可以执行以下操作:

library(dplyr)

df %>%
  group_by(Year, Month) %>%
  mutate(Binary_Count = sum(Binary), 
         HighestRed = as.integer(Binary_Count > 1 & 
                                 row_number() == match('Red', Color)))

#    Year Month      Day     X Binary Color Binary_Count HighestRed
#   <int> <chr>    <int> <dbl>  <int> <chr>        <int>      <int>
# 1  2018 January      1   4.5      1 Red              1          0
# 2  2018 January      4   3.2      0 Red              1          0
# 3  2018 January     11   1.1      0 Blue             1          0
# 4  2018 February     7   5.4      1 Blue             1          0
# 5  2018 February    15   1.5      0 Red              1          0
# 6  2019 January      3   8.6      1 Red              3          1
# 7  2019 January     22   9.1      1 Blue             3          0
# 8  2019 January     23   5.5      1 Red              3          0
# 9  2019 February     5   6.9      0 Red              0          0
#10  2019 February    10   1.8      0 Red              0          0

An option with data.table data.table的选项

library(data.table)
setDT(df)[, c("Binary_Count", "HighestRed") := .(sum(Binary),
               +(sum(Binary) > 1 & 
                  seq_len(.N) == which(Color == 'Red')[1])), 
              by = .(Year, Month)]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM