[英]How to count number of binary observations within a group?
I am working with the following data:我正在处理以下数据:
Year Month Day X Binary Color
2018 January 1 4.5 1 Red
2018 January 4 3.2 0 Red
2018 January 11 1.1 0 Blue
2018 February 7 5.4 1 Blue
2018 February 15 1.5 0 Red
2019 January 3 8.6 1 Red
2019 January 22 9.1 1 Blue
2019 January 23 5.5 1 Red
2019 February 5 6.9 0 Red
2019 February 10 1.8 0 Red
I am looking to create a new column that counts the number of times the number of instances that the binary variable is equal to 1 for a given month:我正在寻找一个新列来计算给定月份二进制变量等于 1 的实例数的次数:
Year Month Day X Binary Color Binary Count
2018 January 1 4.5 1 Red 1
2018 January 4 3.2 0 Red 1
2018 January 11 1.1 0 Blue 1
2018 February 7 5.4 1 Blue 1
2018 February 15 1.5 0 Red 1
2019 January 3 8.6 1 Red 3
2019 January 22 9.1 1 Blue 3
2019 January 23 5.5 1 Red 3
2019 February 5 6.9 0 Red 0
2019 February 10 1.8 0 Red 0
I would also like to add a column which indicates the highest observation if there are more than 1 observation where Binary equals 1 and the color equal to red.我还想添加一列,如果有多个观察值,其中二进制等于 1 ,颜色等于红色,则表示最高观察值。
Year Month Day X Binary Color Binary Count HighestRed
2018 January 1 4.5 1 Red 1 0
2018 January 4 3.2 0 Red 1 0
2018 January 11 1.1 0 Blue 1 0
2018 February 7 5.4 1 Blue 1 0
2018 February 15 1.5 0 Red 1 0
2019 January 3 8.6 1 Red 3 1
2019 January 22 9.1 1 Blue 3 0
2019 January 23 5.5 1 Red 3 0
2019 February 5 6.9 0 Red 0 0
2019 February 10 1.8 0 Red 0 0
Thanks in advance!提前致谢!
You can do the following:您可以执行以下操作:
library(dplyr)
df %>%
group_by(Year, Month) %>%
mutate(Binary_Count = sum(Binary),
HighestRed = as.integer(Binary_Count > 1 &
row_number() == match('Red', Color)))
# Year Month Day X Binary Color Binary_Count HighestRed
# <int> <chr> <int> <dbl> <int> <chr> <int> <int>
# 1 2018 January 1 4.5 1 Red 1 0
# 2 2018 January 4 3.2 0 Red 1 0
# 3 2018 January 11 1.1 0 Blue 1 0
# 4 2018 February 7 5.4 1 Blue 1 0
# 5 2018 February 15 1.5 0 Red 1 0
# 6 2019 January 3 8.6 1 Red 3 1
# 7 2019 January 22 9.1 1 Blue 3 0
# 8 2019 January 23 5.5 1 Red 3 0
# 9 2019 February 5 6.9 0 Red 0 0
#10 2019 February 10 1.8 0 Red 0 0
An option with data.table
data.table
的选项
library(data.table)
setDT(df)[, c("Binary_Count", "HighestRed") := .(sum(Binary),
+(sum(Binary) > 1 &
seq_len(.N) == which(Color == 'Red')[1])),
by = .(Year, Month)]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.