简体   繁体   English

基于列的条件和在 R 中重复(按行)

[英]Conditional sums based on the columns are duplicated (by row) in R

Working on a bit of a tricky problem.处理一个棘手的问题。 My data set is as follows:我的数据集如下:

df <- data.frame("WS_bTIV" = c(5,0,10),"WS_cTIV" = c(0,5,10),"EQ_bTIV"=c(5,10,10),"EQ_cTIV"=c(10,5,10))

> df
  WS_bTIV WS_cTIV EQ_bTIV EQ_cTIV
1       5       0       5      10
2       0       5      10       5
3      10      10      10      10

I am trying to create a total column which will total up the columns that end with "bTIV" regardless with what they begin with.我正在尝试创建一个总列,它将汇总以“bTIV”结尾的列,无论它们以什么开头。 However, the data is duplicated across some columns.但是,数据在某些列中重复。 For instance, if you look at row 1:例如,如果您查看第 1 行:

Both the WS_bTIV and the EQ_bTIV column have a value of 5. However, summing these give us 10. However, I know from the data that the actual true total is actually 5 and the value 5 has been duplicated over these columns. WS_bTIV 和 EQ_bTIV 列的值都是 5。但是,将它们相加得到 10。但是,我从数据中知道实际的真实总数实际上是 5,并且值 5 已在这些列中重复。 So the total in this case should actually just be 5.所以在这种情况下,总数实际上应该是 5。

Sometimes however, (eg in row 2) the value can be 0 and you can just sum up as normal.然而,有时(例如在第 2 行)该值可能为 0,您可以照常总结。

The output should be as follows: output应该如下:

  WS_bTIV WS_cTIV EQ_bTIV EQ_cTIV Tot_bTIV Tot_cTIV
1       5       0       5      10        5       10
2       0       5      10       5       10        5
3      10      10      10      10       10       10

Does anyone have any ideas?有没有人有任何想法?

Using the sum of unique bTIV & cTIV values by row按行使用唯一bTIVcTIV值的总和

df$Tot_bTIV <- apply(df[grepl("bTIV$",colnames(df))], 1, function(x) sum(unique(x)))
df$Tot_cTIV <- apply(df[grepl("cTIV$",colnames(df))], 1, function(x) sum(unique(x)))


> df
  WS_bTIV WS_cTIV EQ_bTIV EQ_cTIV Tot_bTIV Tot_cTIV
1       5       0       5      10        5       10
2       0       5      10       5       10        5
3      10      10      10      10       10       10
df %>% 
  mutate(row_id = seq_len(n())) %>%
  pivot_longer(
    -row_id,
    names_to = c(".value", "group"),
    names_pattern = "(.*)_(.*)"
  ) %>%
  group_by(row_id, group) %>%
  mutate(Tot = if_else(WS == EQ, WS, WS + EQ)) %>%
  ungroup() %>%
  pivot_wider(
    names_from = group,
    names_sep = "_",
    values_from = c(WS, EQ, Tot)
  ) %>%
  select(-row_id)

OUTPUT OUTPUT

# A tibble: 3 x 6
  WS_bTIV WS_cTIV EQ_bTIV EQ_cTIV Tot_bTIV Tot_cTIV
    <dbl>   <dbl>   <dbl>   <dbl>    <dbl>    <dbl>
1       5       0       5      10        5       10
2       0       5      10       5       10        5
3      10      10      10      10       10       10

It's a combination of Daniel O's and det's answers, using dplyr :这是 Daniel O 和 det 答案的组合,使用dplyr

df %>%
  rowwise() %>%
  mutate(Tot_bTIV = sum(unique(c(WS_bTIV, EQ_bTIV))) ,
         Tot_cTIV = sum(unique(c(WS_cTIV, EQ_cTIV))))

Another option is c_across from dplyr_1.0.0另一个选项是c_acrossdplyr_1.0.0

library(dplyr)
df %>% 
     rowwise %>% 
     mutate(Tot_bTIV = sum(unique(c_across(ends_with('bTIV')))), 
            Tot_cTIV = sum(unique(c_across(ends_with('cTIV')))))
# A tibble: 3 x 6
# Rowwise: 
#  WS_bTIV WS_cTIV EQ_bTIV EQ_cTIV Tot_bTIV Tot_cTIV
#    <dbl>   <dbl>   <dbl>   <dbl>    <dbl>    <dbl>
#1       5       0       5      10        5       10
#2       0       5      10       5       10        5
#3      10      10      10      10       10       10

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM