简体   繁体   English

如何实现提取/分离功能(来自dplyr和tidyr)以将一列分为多个列。 基于任意值?

[英]How to implement extract/separate functions (from dplyr and tidyr) to separate a column into multiple columns. based on arbitrary values?

I have a column: 我有一列:

Y = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20)

I would like to split into multiple columns, based on the positions of the column values. 我想根据列值的位置分为多个列。 For instance, I would like: 例如,我想要:

Y1=c(1,2,3,4,5)
Y2=c(6,7,8,9,10)
Y3=c(11,12,13,14,15)
Y4=c(16,17,18,19,20)

Since I am working with a big data time series set, the divisions will be arbitrary depending on the length of one time period. 由于我使用的是大数据时间序列集,因此根据一个时间段的长度,划分将是任意的。

Not a dplyr solution, but I believe the easiest way would involve using matrices. 这不是dplyr解决方案,但我认为最简单的方法将涉及使用矩阵。

foo = function(data, sep.in=5) {
data.matrix = matrix(data,ncol=5)
data.df = as.data.frame(data.matrix)
return(data.df)
}

I have not tested it but this function should create a data.frame who can be merge to a existing one using cbind() 我没有测试过,但是此函数应该创建一个data.frame,可以使用cbind()合并到现有的cbind()

You can use the base split to split this vector into vectors that are each 5 items long. 您可以使用基本split将此向量拆分为每个长度为5的向量。 You could also use a variable to store this interval length. 您也可以使用变量存储此间隔长度。

Using rep with each = 5 , and creating a sequence programmatically, gets you a sequence of the numbers 1, 2, ... up to the length divided by 5 (in this case, 4), each 5 times consecutively. each = 5使用rep并以编程方式创建一个序列,即可得到一个数字1、2,...的序列,其长度除以5(在这种情况下为4),并连续5次。 Then split returns a list of vectors. 然后split返回向量列表。

It's worth noting that a variety of SO posts will recommend you store similar data in lists such as this, rather than creating multiple variables, so I'm leaving it in list form here. 值得注意的是,各种各样的SO帖子都建议您将类似的数据存储在诸如此类的列表中,而不是创建多个变量,因此我将其保留在列表形式中。

Y <- 1:20

breaks <- rep(1:(length(Y) / 5), each = 5)
split(Y, breaks)
#> $`1`
#> [1] 1 2 3 4 5
#> 
#> $`2`
#> [1]  6  7  8  9 10
#> 
#> $`3`
#> [1] 11 12 13 14 15
#> 
#> $`4`
#> [1] 16 17 18 19 20

Created on 2019-02-12 by the reprex package (v0.2.1) reprex软件包 (v0.2.1)创建于2019-02-12

We can make use of split (writing the commented code as solution) to split the vector into a list of vector s. 我们可以利用split (将注释代码编写为解决方案)将vector splitvector s的list

lst <- split(Y, as.integer(gl(length(Y), 5, length(Y))))
lst
#$`1`
#[1] 1 2 3 4 5

#$`2`
#[1]  6  7  8  9 10

#$`3`
#[1] 11 12 13 14 15

#$`4`
#[1] 16 17 18 19 20

Here, the gl create a grouping index by specifying the n , k and length parameters where n - an integer giving the number of levels, k - an integer giving the number of replications, and length -an integer giving the length of the result. 在这里, gl通过指定nklength参数来创建分组索引,其中n给出级别k的整数, k给出重复数的整数,length-给出结果length的整数。

In our case, we want to have 'k' as 5. 在我们的例子中,我们希望'k'为5。

as.integer(gl(length(Y), 5, length(Y)))
#[1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4

If we want to have multiple objects in the global environment, use list2env 如果要在全局环境中具有多个对象,请使用list2env

list2env(setNames(lst, paste0("Y", seq_along(lst))), envir = .GlobalEnv)
Y1
#[1] 1 2 3 4 5
Y2
#[1]  6  7  8  9 10
Y3
#[1] 11 12 13 14 15
Y4
#[1] 16 17 18 19 20

Or as the OP mentioned dplyr/tidyr in the question, we can use those packages as well 或者正如OP在问题中提到的dplyr/tidyr ,我们也可以使用这些软件包

library(tidyverse)
tibble(Y) %>%
   group_by(grp = (row_number()-1) %/% 5 + 1) %>% 
   summarise(Y = list(Y)) %>%
   pull(Y)
#[[1]]
#[1] 1 2 3 4 5

#[[2]]
#[1]  6  7  8  9 10

#[[3]]
#[1] 11 12 13 14 15

#[[4]]
#[1] 16 17 18 19 20

data 数据

Y <- 1:20

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM