[英]R: How to delete rows based on sum values of certain rows?
Apologies if the question title is confusing - I wasn't sure how to frame it.如果问题标题令人困惑,我深表歉意 - 我不知道如何构建它。
I have the following data frame:我有以下数据框:
df <- data.frame(
comp_name = c("X", "A", "B", "C", "D", "Y", "E", "F", "G", "H", "Z", "J", "K", "L", "M"),
parent_comp_name = c("X", "X", "X", "X", "X", "Y", "Y", "Y", "Y", "Y", "Z", "Z", "Z", "Z", "Z"),
country = c("US", "US", "UK", "France", "Germany", "France", "US", "UK", "France", "Germany", "Germany", "US", "UK", "France", "Germany"),
filing = c("Group", "Solo", "Solo", "Solo", "Solo", "Group", "Solo", "Solo", "Solo", "Solo", "Group", "Solo", "Solo", "Solo", "Solo"),
profit = c(540, 100, 125, 150, 165, 495, 150, 110, 110, 125, 550, 130, 250, 95, 100)
)
data:数据:
comp_name parent_comp_name country filing profit
1 X X US Group 540
2 A X US Solo 100
3 B X UK Solo 125
4 C X France Solo 150
5 D X Germany Solo 165
6 Y Y France Group 495
7 E Y US Solo 150
8 F Y UK Solo 110
9 G Y France Solo 110
10 H Y Germany Solo 125
11 Z Z Germany Group 550
12 J Z US Solo 130
13 K Z UK Solo 250
14 L Z France Solo 95
15 M Z Germany Solo 100
This data frame is a simplified version of the actual data I am working with.这个数据框是我正在使用的实际数据的简化版本。
I want to write a script which checks: For a given parent company (say X), if the sum of profits for all the solo parent_company X filings is equal to the group filing profit, delete the solo rows.我想编写一个脚本来检查: 对于给定的母公司(比如 X),如果所有单独的 parent_company X 申报的利润总和等于集团申报的利润,则删除单独的行。
I want the output table to look like this:我希望 output 表如下所示:
comp_name parent_comp_name country filing profit
1 X X US Group 540
2 Y Y France Group 495
3 Z Z Germany Group 550
4 J Z US Solo 130
5 K Z UK Solo 250
6 L Z France Solo 95
7 M Z Germany Solo 100
Here you can see that the solo filings for parent_comp_name X and Y have been removed, as their profits summed up to the respective group total profit.在这里,您可以看到 parent_comp_name X 和 Y 的单独申报已被删除,因为它们的利润总计为各自集团的总利润。 However, rows for company Z were not removed as the sum of solo profits did not add up to the group.
但是,Z 公司的行没有被删除,因为单独利润的总和没有加到该组中。
I am relatively new to R and do not know how to go about getting started with this.我对 R 比较陌生,不知道如何开始使用 go。 Any help would be greatly appreciated.
任何帮助将不胜感激。 Thanks!
谢谢!
Basic idea: keep all rows with filing == "Group"
and find which rows with filing == "Solo"
to keep.基本思想:保留所有带有
filing == "Group"
的行,并找到要保留的filing == "Solo"
的行。
library(tidyverse)
keep <- df %>%
group_by(parent_comp_name, filing) %>%
summarise(s = sum(profit)) %>%
ungroup() %>%
pivot_wider(names_from = filing, values_from = s) %>%
filter(Group != Solo) %>%
pluck("parent_comp_name") %>%
as.character()
df %>%
filter(filing == "Group" | parent_comp_name %in% keep)
comp_name parent_comp_name country filing profit
1 X X US Group 540
2 Y Y France Group 495
3 Z Z Germany Group 550
4 J Z US Solo 130
5 K Z UK Solo 250
6 L Z France Solo 95
7 M Z Germany Solo 100
Here's a way with dplyr
-这是
dplyr
的一种方法 -
library(dplyr)
df %>%
group_by(parent_comp_name) %>%
filter(if(sum(profit[filing == 'Solo']) == sum(profit[filing != 'Solo']))
filing != 'Solo' else TRUE) %>%
ungroup
# comp_name parent_comp_name country filing profit
# <chr> <chr> <chr> <chr> <dbl>
#1 X X US Group 540
#2 Y Y France Group 495
#3 Z Z Germany Group 550
#4 J Z US Solo 130
#5 K Z UK Solo 250
#6 L Z France Solo 95
#7 M Z Germany Solo 100
For a parent_comp_name
, if sum
of profit
for filling = 'Solo'
is equal to sum
of profit
for non-solo drop rows where filing = 'Solo'
.对于
parent_comp_name
,如果填充的profit
sum
filing = 'Solo'
filling = 'Solo'
等于 Filing = 'Solo' 的非单独放置行的profit
sum
。
Slightly longer version of Ronak's code but another approach: Ronak 代码的稍长版本,但另一种方法:
library(dplyr)
df %>% group_by(parent_comp_name) %>%
mutate(grp_profit = sum(profit[filing == 'Group']), solo_profit = sum(profit[filing == 'Solo'])) %>%
filter(if(grp_profit == solo_profit) filing == 'Group' else TRUE) %>% select(-c(grp_profit,solo_profit))
# A tibble: 7 x 5
# Groups: parent_comp_name [3]
comp_name parent_comp_name country filing profit
<chr> <chr> <chr> <chr> <dbl>
1 X X US Group 540
2 Y Y France Group 495
3 Z Z Germany Group 550
4 J Z US Solo 130
5 K Z UK Solo 250
6 L Z France Solo 95
7 M Z Germany Solo 100
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.