[英]Using “mutate” with conditional functions
I've been trying to use dplyr as a mean to solve the next problem but any other method will be appreciated.我一直在尝试使用 dplyr 作为解决下一个问题的手段,但任何其他方法都会受到赞赏。
I have this data frame我有这个数据框
df <- tibble(
x = sample(rep(c(0, 1),10),10),
a_1 = rnorm(10),
b_1 = rnorm(10),
c_1 = rnorm(10),
a_2 = rnorm(10),
b_2 = rnorm(10),
c_2 = rnorm(10),
...
)
My goal is to create a group of new variables a_2_temp, b_2_temp, ...
equal in value to a_2, b_2, ...
in the same data frame based on the values of x
and one other differente variable.我的目标是根据
x
和另一个不同变量的值在同一数据框中创建一组新变量a_2_temp, b_2_temp, ...
的值等于a_2, b_2, ...
Such as:如:
df%>% mutate(a_2_temp = (ifelse(x==1 & a_1 > 0, a_2, 0)))
Now, the thing is that I need a way to automate this function to use it together with across
in order to create new variables for a data frame of a couple hundred columns.现在,问题是我需要一种方法来自动化这个
across
以将它与 cross 一起使用,以便为数百列的数据框创建新变量。 I could simply do this by repeating the code just changing the name of the variables but that would be very hard with my actual data set, as it has a couple hundred variables我可以通过重复代码来简单地做到这一点,只是更改变量的名称,但这对于我的实际数据集来说非常困难,因为它有几百个变量
df%>% mutate(a_2_temp = (ifelse(x==1 & a_1 > 0, a_2, 0))) %>%
mutate(b_2_temp = (ifelse(x==1 & b_1 > 0, b_2, 0))) %>%
mutate(c_2_temp = (ifelse(x==1 & c_1 > 0, c_2, 0))) %>%
mutate(d_2_temp = (ifelse(x==1 & d_1 > 0, d_2, 0))) %>%
...
So far the closest I have come to a solution is something like this:到目前为止,我最接近的解决方案是这样的:
eval<-function(a,b){
ifelse(b==1 & a>0, a, 0)
}
df<-df%>%mutate(across(c("a_1":"n_2"), list(temp=~eval(a=.x, b=x))
However, this can only make a reference to the variable it0s using as a benchmark, while I want it to use *_1 as a benchmark to copy the value in *_2但是,这只能引用变量 it0s 用作基准,而我希望它使用 *_1 作为基准来复制 *_2 中的值
Here is an option with across
.这是一个带有
across
选项。 Loop over the columns that have column names that ends_with
"_2", create the logical condition with 'x' value of 1 and the corresponding column value greater than 0 (created by replacing the '_2' with '_1' and get
the value of column), then return the '_2' column value or else 0, change the column name by appending '_temp' as suffix in .names
( {.col}
- returns the original column name)遍历列名以“_2”
ends_with
的列,创建“x”值为 1 且相应列值大于 0 的逻辑条件(通过将“_2”替换为“_1”并get
列),然后返回 '_2' 列值或 0,通过在.names
中附加 '_temp' 作为后缀来更改列名( {.col}
- 返回原始列名)
library(dplyr)
library(stringr)
df1 <- df %>%
mutate(across(ends_with('_2'),
~ case_when(x == 1 & get(str_replace(cur_column(), '_2', '_1')) > 0 ~
.,
TRUE ~ 0), .names = '{.col}_temp'))
-output -输出
df1
# A tibble: 10 x 10
# x a_1 b_1 c_1 a_2 b_2 c_2 a_2_temp b_2_temp c_2_temp
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 1 2.73 -0.140 -0.782 2.07 0.364 0.245 2.07 0 0
# 2 1 -0.321 -0.114 0.333 -0.0401 -0.547 0.719 0 0 0.719
# 3 1 -0.753 -0.103 1.53 0.0359 1.85 1.36 0 0 1.36
# 4 0 -0.994 0.980 -0.651 1.13 -0.179 0.557 0 0 0
# 5 0 -0.639 1.01 -0.374 -0.325 0.475 0.287 0 0 0
# 6 1 0.450 -0.0441 -0.924 0.856 0.217 1.65 0.856 0 0
# 7 0 0.120 0.282 -0.931 -1.36 -0.0353 -1.82 0 0 0
# 8 0 -0.756 0.0408 -0.309 0.731 -0.169 0.153 0 0 0
# 9 1 0.140 0.494 1.65 0.912 -0.330 -0.0840 0.912 -0.330 -0.0840
#10 0 -0.928 1.16 -1.06 -1.59 0.0439 -1.08 0 0 0
Also, as we just wanted to replace with 0
, simple multiplication with logical vector would be enough as TRUE -> 1
, and FALSE -> 0
thus any value multiplied by 0 returns 0 and with 1 returns the value此外,由于我们只是想用
0
替换,所以与逻辑向量的简单乘法就足够了TRUE -> 1
和FALSE -> 0
因此任何乘以 0 的值都返回 0 和 1 返回的值
df %>%
mutate(across(ends_with('_2'),
~ . *(x == 1 & get(str_replace(cur_column(), '_2', '_1'))),
.names = '{.col}_temp'))
Another option is to split the data into chunks of data with split.default
, loop over the list
with map
, do the transformation and bind those columns with original另一种选择是使用 split.default 将数据拆分为数据块,使用
split.default
map
list
,进行转换并将这些列与原始列绑定
library(purrr)
df %>%
select(-x) %>%
split.default(str_remove(names(.), '_\\d+$')) %>%
map_dfc(~ .x[[2]] * (df[['x']] > 0 & .x[[1]] > 0)) %>%
rename_all(~ str_c(., '_2_temp')) %>%
bind_cols(df, .)
df <- structure(list(x = c(1, 1, 1, 0, 0, 1, 0, 0, 1, 0), a_1 = c(2.73310355409357,
-0.320612007980402, -0.753457274553722, -0.993806784470467, -0.638863336940367,
0.449760522371564, 0.119872527846818, -0.755664301704646, 0.139745073657684,
-0.92777433835819), b_1 = c(-0.139788654259498, -0.114412680908762,
-0.102836187925709, 0.980330559943683, 1.01472611411422, -0.0441288105926913,
0.2815151064984, 0.0407677709798372, 0.49417281865305, 1.16312935730339
), c_1 = c(-0.78179575165366, 0.33274093322335, 1.53346307214684,
-0.650564763278306, -0.373704486693932, -0.924228720715619, -0.931179032930509,
-0.309468200147579, 1.6513839050529, -1.06455672195892), a_2 = c(2.07296416623927,
-0.040135834336151, 0.0359118773308408, 1.13285720793684, -0.324655504171795,
0.856081768489117, -1.36456191552214, 0.730800040331243, 0.912096452304384,
-1.59124725717562), b_2 = c(0.36365730618185, -0.547314112818983,
1.850134670075, -0.178995695839892, 0.474832212746808, 0.216839372888426,
-0.0353431588238, -0.169393100775411, -0.330432553833477, 0.043945304544359
), c_2 = c(0.245070864427874, 0.71886275016605, 1.35567222367957,
0.556607205459845, 0.287483186639216, 1.65350317111755, -1.81872622002345,
0.152993150129941, -0.0840400626089268, -1.08300472554552)), row.names = c(NA,
-10L), class = c("tbl_df", "tbl", "data.frame"))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.