简体   繁体   English

R:如何根据某些行的总和值删除行?

[英]R: How to delete rows based on sum values of certain rows?

Apologies if the question title is confusing - I wasn't sure how to frame it.如果问题标题令人困惑,我深表歉意 - 我不知道如何构建它。

I have the following data frame:我有以下数据框:

df <- data.frame(
  comp_name = c("X", "A", "B", "C", "D", "Y", "E", "F", "G", "H", "Z", "J", "K", "L", "M"),
  parent_comp_name = c("X", "X", "X", "X", "X", "Y", "Y", "Y", "Y", "Y", "Z", "Z", "Z", "Z", "Z"),
  country = c("US", "US", "UK", "France", "Germany", "France", "US", "UK", "France", "Germany", "Germany", "US", "UK", "France", "Germany"),
  filing = c("Group", "Solo", "Solo", "Solo", "Solo", "Group", "Solo", "Solo", "Solo", "Solo", "Group", "Solo", "Solo", "Solo", "Solo"),
  profit = c(540, 100, 125, 150, 165, 495, 150, 110, 110, 125, 550, 130, 250, 95, 100)
)

data:数据:

  comp_name parent_comp_name country filing profit
1          X                X      US  Group    540
2          A                X      US   Solo    100
3          B                X      UK   Solo    125
4          C                X  France   Solo    150
5          D                X Germany   Solo    165
6          Y                Y  France  Group    495
7          E                Y      US   Solo    150
8          F                Y      UK   Solo    110
9          G                Y  France   Solo    110
10         H                Y Germany   Solo    125
11         Z                Z Germany  Group    550
12         J                Z      US   Solo    130
13         K                Z      UK   Solo    250
14         L                Z  France   Solo     95
15         M                Z Germany   Solo    100

This data frame is a simplified version of the actual data I am working with.这个数据框是我正在使用的实际数据的简化版本。

I want to write a script which checks: For a given parent company (say X), if the sum of profits for all the solo parent_company X filings is equal to the group filing profit, delete the solo rows.我想编写一个脚本来检查: 对于给定的母公司(比如 X),如果所有单独的 parent_company X 申报的利润总和等于集团申报的利润,则删除单独的行。

I want the output table to look like this:我希望 output 表如下所示:

  comp_name parent_comp_name country filing profit
1         X                X      US  Group    540
2         Y                Y  France  Group    495
3         Z                Z Germany  Group    550
4         J                Z      US   Solo    130
5         K                Z      UK   Solo    250
6         L                Z  France   Solo     95
7         M                Z Germany   Solo    100

Here you can see that the solo filings for parent_comp_name X and Y have been removed, as their profits summed up to the respective group total profit.在这里,您可以看到 parent_comp_name X 和 Y 的单独申报已被删除,因为它们的利润总计为各自集团的总利润。 However, rows for company Z were not removed as the sum of solo profits did not add up to the group.但是,Z 公司的行没有被删除,因为单独利润的总和没有加到该组中。

I am relatively new to R and do not know how to go about getting started with this.我对 R 比较陌生,不知道如何开始使用 go。 Any help would be greatly appreciated.任何帮助将不胜感激。 Thanks!谢谢!

Basic idea: keep all rows with filing == "Group" and find which rows with filing == "Solo" to keep.基本思想:保留所有带有filing == "Group"的行,并找到要保留的filing == "Solo"的行。

library(tidyverse)

keep <- df %>%
  group_by(parent_comp_name, filing) %>%
  summarise(s = sum(profit)) %>%
  ungroup() %>%
  pivot_wider(names_from = filing, values_from = s) %>%
  filter(Group != Solo)  %>%
  pluck("parent_comp_name") %>%
  as.character()

df %>%
  filter(filing == "Group" | parent_comp_name %in% keep) 

  comp_name parent_comp_name country filing profit
1         X                X      US  Group    540
2         Y                Y  France  Group    495
3         Z                Z Germany  Group    550
4         J                Z      US   Solo    130
5         K                Z      UK   Solo    250
6         L                Z  France   Solo     95
7         M                Z Germany   Solo    100

Here's a way with dplyr -这是dplyr的一种方法 -

library(dplyr)

df %>%
  group_by(parent_comp_name) %>%
  filter(if(sum(profit[filing == 'Solo']) == sum(profit[filing != 'Solo'])) 
         filing != 'Solo' else TRUE) %>% 
  ungroup

# comp_name parent_comp_name country filing profit
#  <chr>     <chr>            <chr>   <chr>   <dbl>
#1 X         X                US      Group     540
#2 Y         Y                France  Group     495
#3 Z         Z                Germany Group     550
#4 J         Z                US      Solo      130
#5 K         Z                UK      Solo      250
#6 L         Z                France  Solo       95
#7 M         Z                Germany Solo      100

For a parent_comp_name , if sum of profit for filling = 'Solo' is equal to sum of profit for non-solo drop rows where filing = 'Solo' .对于parent_comp_name ,如果填充的profit sum filing = 'Solo' filling = 'Solo'等于 Filing = 'Solo' 的非单独放置行的profit sum

Slightly longer version of Ronak's code but another approach: Ronak 代码的稍长版本,但另一种方法:

library(dplyr)
df %>% group_by(parent_comp_name) %>% 
   mutate(grp_profit = sum(profit[filing == 'Group']), solo_profit = sum(profit[filing == 'Solo'])) %>% 
     filter(if(grp_profit == solo_profit) filing == 'Group' else TRUE) %>% select(-c(grp_profit,solo_profit))
# A tibble: 7 x 5
# Groups:   parent_comp_name [3]
  comp_name parent_comp_name country filing profit
  <chr>     <chr>            <chr>   <chr>   <dbl>
1 X         X                US      Group     540
2 Y         Y                France  Group     495
3 Z         Z                Germany Group     550
4 J         Z                US      Solo      130
5 K         Z                UK      Solo      250
6 L         Z                France  Solo       95
7 M         Z                Germany Solo      100

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM