简体   繁体   English

Pivot 在 R/tidyverse 中使用 pivot_wider() 使所有列更宽(ID 列除外)

[英]Pivot all columns wider (except ID columns) using pivot_wider() in R/tidyverse

I'm trying to transform a data frame from long to wide in R .我正在尝试在R数据框从长数据框转换为宽数据框。 I am trying to pivot all columns wider (excepting columns that uniquely identify observations) using pivot_wider() .我正在尝试使用pivot_wider() Here is a minimal working example:这是一个最小的工作示例:

library("tidyr")

set.seed(12345)

sampleSize <- 10
timepoints <- 3
raters <- 2

data_long <- data.frame(ID = rep(1:sampleSize, each = timepoints * raters),
                        time = rep(1:timepoints, times = sampleSize * raters),
                        rater = rep(c("a","b"), times = sampleSize * timepoints),
                        v1 = sample.int(99, sampleSize * timepoints * raters, replace = TRUE),
                        v2 = sample.int(99, sampleSize * timepoints * raters, replace = TRUE),
                        v3 = sample.int(99, sampleSize * timepoints * raters, replace = TRUE),
                        v100 = sample.int(99, sampleSize * timepoints * raters, replace = TRUE),
                        vA = sample.int(99, sampleSize * timepoints * raters, replace = TRUE),
                        vB = sample.int(99, sampleSize * timepoints * raters, replace = TRUE),
                        vC = sample.int(99, sampleSize * timepoints * raters, replace = TRUE),
                        vZZ = sample.int(99, sampleSize * timepoints * raters, replace = TRUE))

Here are the data:以下是数据:

> tibble(data_long)
# A tibble: 60 x 11
      ID  time rater    v1    v2    v3  v100    vA    vB    vC   vZZ
   <int> <int> <chr> <int> <int> <int> <int> <int> <int> <int> <int>
 1     1     1 a        14    56    30    75    66    22     8    73
 2     1     1 b        90    44    99     8    36    72     1    78
 3     1     2 a        92    35    93    46     4    68    39    52
 4     1     2 b        51    91    50    67    43    72    99    74
 5     1     3 a        80    34    31    31    21    52     7    23
 6     1     3 b        24    86    25    86    20    43    74    89
 7     2     1 a        58    51    48    60     6    56    66    37
 8     2     1 b        96    95    76     1    78     2    65     3
 9     2     2 a        88    26    92    86     7    37    84    15
10     2     2 b        93    55    25    62    27    39    73    85
# ... with 50 more rows

In this example, I have three columns that uniquely identify all observations: ID , time , and rater .在这个例子中,我有三个唯一标识所有观察值的列: IDtimerater I'd like to widen every other column by rater (ie, excluding the ID and time columns).我想按rater扩大所有其他列(即,不包括IDtime列)。 My expected output is:我预期的 output 是:

# A tibble: 30 x 18
      ID  time  v1_a  v1_b  v2_a  v2_b  v3_a  v3_b v100_a v100_b  vA_a  vA_b  vB_a  vB_b  vC_a  vC_b vZZ_a vZZ_b
   <int> <int> <int> <int> <int> <int> <int> <int>  <int>  <int> <int> <int> <int> <int> <int> <int> <int> <int>
 1     1     1    14    90    56    44    30    99     75      8    66    36    22    72     8     1    73    78
 2     1     2    92    51    35    91    93    50     46     67     4    43    68    72    39    99    52    74
 3     1     3    80    24    34    86    31    25     31     86    21    20    52    43     7    74    23    89
 4     2     1    58    96    51    95    48    76     60      1     6    78    56     2    66    65    37     3
 5     2     2    88    93    26    55    92    25     86     62     7    27    37    39    84    73    15    85
 6     2     3    75     2    23    55    28     8     66     74    65    92    58    10    91    65     7    44
 7     3     1    86    94     7    87    78    85     38     87    36    49    89    83    33    34    32    38
 8     3     2    10    75    12    15    21    18     56     77    54    17    61    92    18    50    98    27
 9     3     3    38    81    46    90    20    47     88     15    33    95    66    19    12    27    84    52
10     4     1    32    38    88    68    77    71     10     81    21    54    33    16    90    41    29    72
# ... with 20 more rows

I can widen any given columns using the following syntax:我可以使用以下语法加宽任何给定的列:

data_long %>% 
  pivot_wider(names_from = rater, values_from = c(v1, v2))

Thus, I could widen all columns by entering all of them manually in a vector:因此,我可以通过在向量中手动输入所有列来加宽所有列:

data_long %>% 
  pivot_wider(names_from = rater, values_from = c(v1, v2, v3, v100, vA, vB, vC, vZZ))

However, this becomes unwieldy if I have many columns.但是,如果我有很多列,这会变得很笨拙。 Another approach is to widen columns by specifying a range of columns:另一种方法是通过指定列的范围来加宽列:

data_long %>% 
  pivot_wider(names_from = rater, values_from = v1:vZZ)

However, this approach does not work well if all columns to be widened are not in a single range, for instance if the ID columns are interspersed throughout the data frame (though it would be possible to specify multiple ranges).但是,如果要加宽的所有列不在一个范围内,例如如果 ID 列散布在整个数据框中(尽管可以指定多个范围),则此方法效果不佳。

Is there a way to use pivot_wider() to widen ALL columns except for any columns that I specify as columns that uniquely identify each observation using id_cols (ie, ID and time ).有没有一种方法可以使用pivot_wider()来加宽所有列,除了我指定为使用id_cols (即IDtime )唯一标识每个观察值的列的任何列。 I'd like the solution to be extendable to the case where I have many columns (and thus do not want to specify variable names or ranges for variables to be widened).我希望解决方案可以扩展到我有很多列的情况(因此不想指定变量名称或要扩大的变量范围)。

As we know the first 3 columns, should be fixed, use - on those column names in values_from正如我们所知,前 3 列应该是固定的,请在values_from中的那些列名上使用-

library(dplyr)
library(tidyr)
data_long %>% 
   pivot_wider(names_from = rater, values_from = -names(.)[1:3])

Or if we already create an object或者如果我们已经创建了一个 object

id_cols <- c("ID", "time")
data_long %>%
    pivot_wider(names_from = rater, values_from = -all_of(id_cols))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM