[英]Sum across rows but only the cells that meet a condition
Sample data:样本数据:
df <- tibble(x = c(0.1, 0.2, 0.3, 0.4),
y = c(0.1, 0.1, 0.2, 0.3),
z = c(0.1, 0.2, 0.2, 0.2))
df
# A tibble: 4 x 3
x y z
<dbl> <dbl> <dbl>
1 0.1 0.1 0.1
2 0.2 0.1 0.2
3 0.3 0.2 0.2
4 0.4 0.3 0.2
I want to sum across rows and I want to only add up the "cells" that meet a certain logical condition.我想跨行求和,我只想将满足特定逻辑条件的“单元格”加起来。 In this example, I want to add up, rowwise, only cells that contain a equal to or greater than a specified threshold.
在此示例中,我只想按行将包含等于或大于指定阈值的单元格相加。
Desired Output所需 Output
threshold <- 0.15
# A tibble: 4 x 4
x y z cond_sum
<dbl> <dbl> <dbl> <dbl>
1 0.1 0.1 0.1 0
2 0.2 0.1 0.2 0.4
3 0.3 0.2 0.2 0.7
4 0.4 0.3 0.2 0.9
Pseudo-code伪代码
This is the wrangling idea I have in mind.这是我心目中的争论想法。
df %>%
rowwise() %>%
mutate(cond_sum = sum(c_across(where(~ "cell" >= threshold))))
Tidy solutions appreciated!整洁的解决方案表示赞赏!
An efficient option is replace the values that are below the threshold to NA and make use of na.rm
in rowSums
instead of rowwise/c_across
一个有效的选择是将低于阈值的值替换为 NA 并在
na.rm
中使用rowSums
而不是rowwise/c_across
library(dplyr)
df %>%
mutate(cond_sum = rowSums(replace(., . < threshold, NA), na.rm = TRUE))
-output -输出
# A tibble: 4 x 4
# x y z cond_sum
# <dbl> <dbl> <dbl> <dbl>
#1 0.1 0.1 0.1 0
#2 0.2 0.1 0.2 0.4
#3 0.3 0.2 0.2 0.7
#4 0.4 0.3 0.2 0.9
Or with c_across
或与
c_across
df %>%
rowwise %>%
mutate(cond_sum = {val <- c_across(everything())
sum(val[val >= threshold])}) %>%
ungroup
Or base R
或
base R
df$cond_sum <- rowSums(replace(df, df < threshold, NA), na.rm = TRUE)
An option with dplyr
and purrr
could be: dplyr
和purrr
的选项可以是:
df %>%
mutate(cond_sum = pmap_dbl(across(x:z), ~ sum(c(...)[c(...) > threshold])))
x y z cond_sum
<dbl> <dbl> <dbl> <dbl>
1 0.1 0.1 0.1 0
2 0.2 0.1 0.2 0.4
3 0.3 0.2 0.2 0.7
4 0.4 0.3 0.2 0.9
Or just using dplyr
:或者只使用
dplyr
:
df %>%
mutate(cond_sum = Reduce(`+`, across(x:z) * (across(x:z) > threshold)))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.