[英]Comparing other rows with current row and do conditional sum in R
示例數據如下所示。
data <- data.frame(Name=c('John','John','John','Mary','Mary'),First_value=c(1,3,2,4,5),Second_value=c(3,4,2,7,5),To_sum=c(1,2,3,4,5))
# Name First_value Second_value To_sum
# John 1 3 1
# John 3 4 2
# John 2 2 3
# Mary 4 7 4
# Mary 5 5 5
我想第一組的名字,並為每個組,總結該組中的所有其他行的列“To_sum”(除了當前行),如果其他行有一個“FIRST_VALUE”大於當前行和“SECOND_VALUE " 小於當前行。 如果當前行沒有要求和的值,則新列將為 0。
新列應如下所示:
# Name First_value Second_value To_sum New_column
# John 1 3 1 3
# John 3 4 2 0
# John 2 2 3 0
# Mary 4 7 4 5
# Mary 5 5 5 0
您可以將dplyr
與purrr::map_dbl
library(dplyr)
data %>%
group_by(Name) %>%
mutate(New_column = purrr::map_dbl(row_number(),
~sum(To_sum[First_value > First_value[.x] & Second_value < Second_value[.x]])))
# Name First_value Second_value To_sum New_column
# <fct> <dbl> <dbl> <dbl> <dbl>
#1 John 1 3 1 3
#2 John 3 4 2 0
#3 John 2 2 3 0
#4 Mary 4 7 4 5
#5 Mary 5 5 5 0
添加一個 id 列,您可以使用自聯接和過濾器的方法:
data = mutate(data, id = 1:n())
data %>%
full_join(., ., by = "Name") %>%
filter(id.x != id.y, First_value.x < First_value.y, Second_value.x > Second_value.y) %>%
group_by(Name, id = id.x) %>%
summarize(New_column = sum(To_sum.y)) %>%
right_join(data) %>%
mutate(New_column = coalesce(New_column, 0))
# Joining, by = c("Name", "id")
# # A tibble: 5 x 6
# # Groups: Name [2]
# Name id New_column First_value Second_value To_sum
# <fct> <int> <dbl> <dbl> <dbl> <dbl>
# 1 John 1 3 1 3 1
# 2 John 2 0 3 4 2
# 3 John 3 0 2 2 3
# 4 Mary 4 5 4 7 4
# 5 Mary 5 0 5 5 5
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.