简体   繁体   English

仅使用dplyr选择器跨几列对值进行突变

[英]Mutate a value across several columns using dplyr selectors only

I want to calculate the sd for several columns inside a data frame without leaving my dplyr pipe. 我想在不离开dplyr管道的情况下为数据框内的几列计算sd。 In the past, I have done this by defaulting to base r. 在过去,我默认情况下以r为基数。 I haven't been able to find a solution here that works. 我一直无法在这里找到有效的解决方案。

It may help to provide some context. 这可能有助于提供一些上下文。 This is a process I do to validate survey data. 这是我验证调查数据的过程。 We measure the sd of matrix questions to identify straight-liners. 我们测量矩阵问题的标准差,以识别直线。 An sd of zero across the columns flags a straight line. 跨列的sd为零表示一条直线。 In the past, I calculated this in base R as follows: 过去,我在基数R中对此进行了如下计算:

apply(x, 1, sd)

I know there has to be a way to do this within a dplyr pipe. 我知道必须在dplyr管道中执行此操作。 I've tried several options including pmap and various approaches at mutate_at. 我在mutate_at尝试了多个选项,包括pmap和各种方法。 Here's my latest attempt: 这是我最近的尝试:

library(tidyverse)

set.seed(858465)
scale_points <- c(1:5)
q1 <- sample(scale_points, replace = TRUE, size = 100)
q2 <- sample(scale_points, replace = TRUE, size = 100)
q3 <- sample(scale_points, replace = TRUE, size = 100)


digits = 0:9
createRandString<- function() {
  v = c(sample(LETTERS, 5, replace = TRUE),
        sample(digits, 4, replace = TRUE),
        sample(LETTERS, 1, replace = TRUE))
  return(paste0(v,collapse = ""))
}

s_data <- tibble::tibble(resp_id = 100)
for(i in c(1:100)) {
  s_data[i,1] <- createRandString()
}

s_data <- bind_cols(s_data, q1 = q1, q2 = q2, q3 = q3)

s_data %>% mutate(vars(starts_with("q"), ~sd(.)))

In a perfect world, I would keep the resp_id variable in the output so that I could generate a report using filter to identify the respondent IDs with sd == 0. 在理想情况下,我会将resp_id变量保留在输出中,以便可以使用过滤器生成报告来标识sd == 0的响应者ID。

Any help is greatly appreciated! 任何帮助是极大的赞赏!

If we need a rowwise sd, 如果我们需要逐行sd,

library(tidyverse)
s_data %>% 
   mutate(sdQs =  select(., starts_with("q")) %>% 
                           pmap_dbl(~ sd(c(...)))) %>% 
   filter(sdQs == 0)
# A tibble: 9 x 5
#  resp_id       q1    q2    q3  sdQs
#  <chr>      <int> <int> <int> <dbl>
#1 JORTY8990R     3     3     3     0
#2 TFYAF4729I     5     5     5     0
#3 VPUYC0789H     4     4     4     0
#4 LHAPM6293X     1     1     1     0
#5 FZQRQ8530P     3     3     3     0
#6 TKTJU3757T     5     5     5     0
#7 AYVHO1309H     4     4     4     0
#8 BBPTZ4822E     5     5     5     0
#9 NGLXT1705B     3     3     3     0

Or another option is rowSds from matrixStats 或者另一种选择是rowSdsmatrixStats

library(matrixStats)
s_data %>% 
    mutate(sdQs = rowSds(as.matrix(.[startsWith(names(.), "q")])))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM