简体   繁体   English

尽管 r 中有独特的列标题,但仍通过位置模式堆叠列

[英]stacking columns through positional patterns despite unique column headers in r

I have a series of columns in my data frame (299 of them) which have the same information type repeated every 3 columns.我的数据框中有一系列列(其中 299 个),它们每 3 列重复一次相同的信息类型。 I need to stack them so each set of 3 retain their referential integrity with each other, and the preceding 22 columns.我需要将它们堆叠起来,以便每组 3 保持它们彼此之间的参照完整性,以及前面的 22 列。 Normally I'd use melt() to go from wide to long form, but that would involve one or two categories with common names.通常我会使用melt() 从宽格式到长格式,但这会涉及一两个具有通用名称的类别。 Each of these columns has a unique name (eg "x1", "y1", "z1","x2","y2","z2").这些列中的每一列都有一个唯一的名称(例如“x1”、“y1”、“z1”、“x2”、“y2”、“z2”)。

For the sake of example, lets take example data mtcars.为了举例,让我们以数据 mtcars 为例。 These data contain the 11 columns, each with unique names: mpg, cyl, disp, hp, drat, wt, qsec, vs, am, gear, carb.这些数据包含 11 列,每列都有唯一的名称:mpg、cyl、disp、hp、drat、wt、qsec、vs、am、gear、carb。 Lets say I want to keep columns 1:5 as identifiers, then stack the data columns 6:8 on top of 9:11.假设我想保留 1:5 列作为标识符,然后将数据列 6:8 堆叠在 9:11 之上。 In this simple example I could use stack or melt and say which columns to use, but when we extrapolate out to hundreds of columns this gets to be a pretty horrid task very quickly... thoughts?在这个简单的例子中,我可以使用 stack 或 Melt 并说明要使用哪些列,但是当我们推断出数百列时,这很快就会成为一项非常可怕的任务......想法?

Given that your column names all share a prefix, tidyr::pivot_longer makes this easy.鉴于您的列名都共享一个前缀, tidyr::pivot_longer使这变得容易。 First I'll create some sample data based on your description:首先,我将根据您的描述创建一些示例数据:

## semi-realistic sample data
nr = 3
n_group = 4
n_per_group = 3
nc = n_group * n_per_group
df = as.data.frame(matrix(rnorm(nr * nc), nrow = nr))
names(df) = outer(letters[1:n_group], 1:n_per_group, paste, sep = ".")
df = cbind(id = 1:nr, df)
#   id        a.1         b.1         c.1         d.1        a.2       b.2        c.2        d.2
# 1  1  0.7898394 -2.34853091 -0.15575050 -1.18205111 -0.6348410  2.545300  0.8265652  0.8817947
# 2  2  0.5717691 -1.46586503  0.01206218  0.83525523  1.6629318  1.602706 -1.5781837  1.1245715
# 3  3 -1.5727813  0.04925075 -0.82081511 -0.06905755 -0.4137177 -1.734805 -0.3742163 -1.0701552
#         a.3        b.3         c.3         d.3
# 1 0.3238154  2.2401514 -0.70958661 -0.09062723
# 2 0.7016303 -0.9547929  0.23150225  0.39295502
# 3 0.7229709 -1.7137527  0.09044422 -0.59431548

With tidyr :使用tidyr

## convert to long format
library(tidyr)
pivot_longer(df, cols = -id, names_sep = "\\.", names_to = c(".value", "iteration"))
# # A tibble: 9 x 6
#      id iteration      a       b       c       d
#   <int> <chr>      <dbl>   <dbl>   <dbl>   <dbl>
# 1     1 1          0.790 -2.35   -0.156  -1.18  
# 2     1 2         -0.635  2.55    0.827   0.882 
# 3     1 3          0.324  2.24   -0.710  -0.0906
# 4     2 1          0.572 -1.47    0.0121  0.835 
# 5     2 2          1.66   1.60   -1.58    1.12  
# 6     2 3          0.702 -0.955   0.232   0.393 
# 7     3 1         -1.57   0.0493 -0.821  -0.0691
# 8     3 2         -0.414 -1.73   -0.374  -1.07  
# 9     3 3          0.723 -1.71    0.0904 -0.594 

Using the same sample data, here's a data.table::melt approach based on column numbers:使用相同的示例数据,这是一个基于列号的data.table::melt方法:

## note that `nc` is the number of non-ID columns.
## The `1:nc + 1` is a way to get the column numbers of the stack columns
## (+ 1 because there is 1 id column in my example)
dt = as.data.table(df)
melt(dt, id.vars = "id", measure.vars = split(1:nc + 1, rep(1:n_group, times = n_per_group)))
#    id variable          1           2           3           4
# 1:  1        1  0.7898394 -2.34853091 -0.15575050 -1.18205111
# 2:  2        1  0.5717691 -1.46586503  0.01206218  0.83525523
# 3:  3        1 -1.5727813  0.04925075 -0.82081511 -0.06905755
# 4:  1        2 -0.6348410  2.54530008  0.82656523  0.88179474
# 5:  2        2  1.6629318  1.60270621 -1.57818365  1.12457150
# 6:  3        2 -0.4137177 -1.73480524 -0.37421630 -1.07015525
# 7:  1        3  0.3238154  2.24015140 -0.70958661 -0.09062723
# 8:  2        3  0.7016303 -0.95479285  0.23150225  0.39295502
# 9:  3        3  0.7229709 -1.71375273  0.09044422 -0.59431548

(The results look different at a glance, but it's just a different order of the same rows.) (结果乍一看不一样,但只是相同行的顺序不同。)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM