简体   繁体   English

在条件函数中使用“mutate”

[英]Using “mutate” with conditional functions

I've been trying to use dplyr as a mean to solve the next problem but any other method will be appreciated.我一直在尝试使用 dplyr 作为解决下一个问题的手段,但任何其他方法都会受到赞赏。

I have this data frame我有这个数据框

df <- tibble(
  x = sample(rep(c(0, 1),10),10),
  a_1 = rnorm(10),
  b_1 = rnorm(10),
  c_1 = rnorm(10),
  a_2 = rnorm(10),
  b_2 = rnorm(10),
  c_2 = rnorm(10),
...
)

My goal is to create a group of new variables a_2_temp, b_2_temp, ... equal in value to a_2, b_2, ... in the same data frame based on the values of x and one other differente variable.我的目标是根据x和另一个不同变量的值在同一数据框中创建一组新变量a_2_temp, b_2_temp, ...的值等于a_2, b_2, ...

Such as:如:

df%>% mutate(a_2_temp = (ifelse(x==1 & a_1 > 0, a_2, 0)))

Now, the thing is that I need a way to automate this function to use it together with across in order to create new variables for a data frame of a couple hundred columns.现在,问题是我需要一种方法来自动化这个across以将它与 cross 一起使用,以便为数百列的数据框创建新变量。 I could simply do this by repeating the code just changing the name of the variables but that would be very hard with my actual data set, as it has a couple hundred variables我可以通过重复代码来简单地做到这一点,只是更改变量的名称,但这对于我的实际数据集来说非常困难,因为它有几百个变量

df%>% mutate(a_2_temp = (ifelse(x==1 & a_1 > 0, a_2, 0))) %>%
 mutate(b_2_temp = (ifelse(x==1 & b_1 > 0, b_2, 0))) %>%
 mutate(c_2_temp = (ifelse(x==1 & c_1 > 0, c_2, 0))) %>%
 mutate(d_2_temp = (ifelse(x==1 & d_1 > 0, d_2, 0))) %>%
...

So far the closest I have come to a solution is something like this:到目前为止,我最接近的解决方案是这样的:

eval<-function(a,b){
      ifelse(b==1 & a>0, a, 0)
      }

df<-df%>%mutate(across(c("a_1":"n_2"), list(temp=~eval(a=.x, b=x))

However, this can only make a reference to the variable it0s using as a benchmark, while I want it to use *_1 as a benchmark to copy the value in *_2但是,这只能引用变量 it0s 用作基准,而我希望它使用 *_1 作为基准来复制 *_2 中的值

Here is an option with across .这是一个带有across选项。 Loop over the columns that have column names that ends_with "_2", create the logical condition with 'x' value of 1 and the corresponding column value greater than 0 (created by replacing the '_2' with '_1' and get the value of column), then return the '_2' column value or else 0, change the column name by appending '_temp' as suffix in .names ( {.col} - returns the original column name)遍历列名以“_2” ends_with的列,创建“x”值为 1 且相应列值大于 0 的逻辑条件(通过将“_2”替换为“_1”并get列),然后返回 '_2' 列值或 0,通过在.names中附加 '_temp' 作为后缀来更改列名( {.col} - 返回原始列名)

library(dplyr)
library(stringr)
df1 <-  df %>% 
  mutate(across(ends_with('_2'), 
   ~ case_when(x == 1 &  get(str_replace(cur_column(), '_2', '_1')) > 0 ~
            .,
       TRUE ~ 0), .names = '{.col}_temp'))

-output -输出

df1
# A tibble: 10 x 10
#       x    a_1     b_1    c_1     a_2     b_2     c_2 a_2_temp b_2_temp c_2_temp
#   <dbl>  <dbl>   <dbl>  <dbl>   <dbl>   <dbl>   <dbl>    <dbl>    <dbl>    <dbl>
# 1     1  2.73  -0.140  -0.782  2.07    0.364   0.245     2.07     0       0     
# 2     1 -0.321 -0.114   0.333 -0.0401 -0.547   0.719     0        0       0.719 
# 3     1 -0.753 -0.103   1.53   0.0359  1.85    1.36      0        0       1.36  
# 4     0 -0.994  0.980  -0.651  1.13   -0.179   0.557     0        0       0     
# 5     0 -0.639  1.01   -0.374 -0.325   0.475   0.287     0        0       0     
# 6     1  0.450 -0.0441 -0.924  0.856   0.217   1.65      0.856    0       0     
# 7     0  0.120  0.282  -0.931 -1.36   -0.0353 -1.82      0        0       0     
# 8     0 -0.756  0.0408 -0.309  0.731  -0.169   0.153     0        0       0     
# 9     1  0.140  0.494   1.65   0.912  -0.330  -0.0840    0.912   -0.330  -0.0840
#10     0 -0.928  1.16   -1.06  -1.59    0.0439 -1.08      0        0       0     

Also, as we just wanted to replace with 0 , simple multiplication with logical vector would be enough as TRUE -> 1 , and FALSE -> 0 thus any value multiplied by 0 returns 0 and with 1 returns the value此外,由于我们只是想用0替换,所以与逻辑向量的简单乘法就足够了TRUE -> 1FALSE -> 0因此任何乘以 0 的值都返回 0 和 1 返回的值

df %>% 
  mutate(across(ends_with('_2'), 
   ~  . *(x == 1 &  get(str_replace(cur_column(), '_2', '_1'))), 
       .names = '{.col}_temp'))

Another option is to split the data into chunks of data with split.default , loop over the list with map , do the transformation and bind those columns with original另一种选择是使用 split.default 将数据拆分为数据块,使用split.default map list ,进行转换并将这些列与原始列绑定

library(purrr)
df %>% 
   select(-x) %>% 
    split.default(str_remove(names(.), '_\\d+$')) %>%
    map_dfc(~ .x[[2]] * (df[['x']] > 0 & .x[[1]] > 0)) %>% 
    rename_all(~ str_c(., '_2_temp')) %>%
    bind_cols(df, .)

data数据

df <- structure(list(x = c(1, 1, 1, 0, 0, 1, 0, 0, 1, 0), a_1 = c(2.73310355409357, 
-0.320612007980402, -0.753457274553722, -0.993806784470467, -0.638863336940367, 
0.449760522371564, 0.119872527846818, -0.755664301704646, 0.139745073657684, 
-0.92777433835819), b_1 = c(-0.139788654259498, -0.114412680908762, 
-0.102836187925709, 0.980330559943683, 1.01472611411422, -0.0441288105926913, 
0.2815151064984, 0.0407677709798372, 0.49417281865305, 1.16312935730339
), c_1 = c(-0.78179575165366, 0.33274093322335, 1.53346307214684, 
-0.650564763278306, -0.373704486693932, -0.924228720715619, -0.931179032930509, 
-0.309468200147579, 1.6513839050529, -1.06455672195892), a_2 = c(2.07296416623927, 
-0.040135834336151, 0.0359118773308408, 1.13285720793684, -0.324655504171795, 
0.856081768489117, -1.36456191552214, 0.730800040331243, 0.912096452304384, 
-1.59124725717562), b_2 = c(0.36365730618185, -0.547314112818983, 
1.850134670075, -0.178995695839892, 0.474832212746808, 0.216839372888426, 
-0.0353431588238, -0.169393100775411, -0.330432553833477, 0.043945304544359
), c_2 = c(0.245070864427874, 0.71886275016605, 1.35567222367957, 
0.556607205459845, 0.287483186639216, 1.65350317111755, -1.81872622002345, 
0.152993150129941, -0.0840400626089268, -1.08300472554552)), row.names = c(NA, 
-10L), class = c("tbl_df", "tbl", "data.frame"))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM