简体   繁体   中英

Conditional sums based on the columns are duplicated (by row) in R

Working on a bit of a tricky problem. My data set is as follows:

df <- data.frame("WS_bTIV" = c(5,0,10),"WS_cTIV" = c(0,5,10),"EQ_bTIV"=c(5,10,10),"EQ_cTIV"=c(10,5,10))

> df
  WS_bTIV WS_cTIV EQ_bTIV EQ_cTIV
1       5       0       5      10
2       0       5      10       5
3      10      10      10      10

I am trying to create a total column which will total up the columns that end with "bTIV" regardless with what they begin with. However, the data is duplicated across some columns. For instance, if you look at row 1:

Both the WS_bTIV and the EQ_bTIV column have a value of 5. However, summing these give us 10. However, I know from the data that the actual true total is actually 5 and the value 5 has been duplicated over these columns. So the total in this case should actually just be 5.

Sometimes however, (eg in row 2) the value can be 0 and you can just sum up as normal.

The output should be as follows:

  WS_bTIV WS_cTIV EQ_bTIV EQ_cTIV Tot_bTIV Tot_cTIV
1       5       0       5      10        5       10
2       0       5      10       5       10        5
3      10      10      10      10       10       10

Does anyone have any ideas?

Using the sum of unique bTIV & cTIV values by row

df$Tot_bTIV <- apply(df[grepl("bTIV$",colnames(df))], 1, function(x) sum(unique(x)))
df$Tot_cTIV <- apply(df[grepl("cTIV$",colnames(df))], 1, function(x) sum(unique(x)))


> df
  WS_bTIV WS_cTIV EQ_bTIV EQ_cTIV Tot_bTIV Tot_cTIV
1       5       0       5      10        5       10
2       0       5      10       5       10        5
3      10      10      10      10       10       10
df %>% 
  mutate(row_id = seq_len(n())) %>%
  pivot_longer(
    -row_id,
    names_to = c(".value", "group"),
    names_pattern = "(.*)_(.*)"
  ) %>%
  group_by(row_id, group) %>%
  mutate(Tot = if_else(WS == EQ, WS, WS + EQ)) %>%
  ungroup() %>%
  pivot_wider(
    names_from = group,
    names_sep = "_",
    values_from = c(WS, EQ, Tot)
  ) %>%
  select(-row_id)

OUTPUT

# A tibble: 3 x 6
  WS_bTIV WS_cTIV EQ_bTIV EQ_cTIV Tot_bTIV Tot_cTIV
    <dbl>   <dbl>   <dbl>   <dbl>    <dbl>    <dbl>
1       5       0       5      10        5       10
2       0       5      10       5       10        5
3      10      10      10      10       10       10

It's a combination of Daniel O's and det's answers, using dplyr :

df %>%
  rowwise() %>%
  mutate(Tot_bTIV = sum(unique(c(WS_bTIV, EQ_bTIV))) ,
         Tot_cTIV = sum(unique(c(WS_cTIV, EQ_cTIV))))

Another option is c_across from dplyr_1.0.0

library(dplyr)
df %>% 
     rowwise %>% 
     mutate(Tot_bTIV = sum(unique(c_across(ends_with('bTIV')))), 
            Tot_cTIV = sum(unique(c_across(ends_with('cTIV')))))
# A tibble: 3 x 6
# Rowwise: 
#  WS_bTIV WS_cTIV EQ_bTIV EQ_cTIV Tot_bTIV Tot_cTIV
#    <dbl>   <dbl>   <dbl>   <dbl>    <dbl>    <dbl>
#1       5       0       5      10        5       10
#2       0       5      10       5       10        5
#3      10      10      10      10       10       10

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM