[英]How to implement extract/separate functions (from dplyr and tidyr) to separate a column into multiple columns. based on arbitrary values?
I have a column: 我有一列:
Y = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20)
I would like to split into multiple columns, based on the positions of the column values. 我想根据列值的位置分为多个列。 For instance, I would like: 例如,我想要:
Y1=c(1,2,3,4,5)
Y2=c(6,7,8,9,10)
Y3=c(11,12,13,14,15)
Y4=c(16,17,18,19,20)
Since I am working with a big data time series set, the divisions will be arbitrary depending on the length of one time period. 由于我使用的是大数据时间序列集,因此根据一个时间段的长度,划分将是任意的。
Not a dplyr solution, but I believe the easiest way would involve using matrices. 这不是dplyr解决方案,但我认为最简单的方法将涉及使用矩阵。
foo = function(data, sep.in=5) {
data.matrix = matrix(data,ncol=5)
data.df = as.data.frame(data.matrix)
return(data.df)
}
I have not tested it but this function should create a data.frame who can be merge to a existing one using cbind()
我没有测试过,但是此函数应该创建一个data.frame,可以使用cbind()
合并到现有的cbind()
You can use the base split
to split this vector into vectors that are each 5 items long. 您可以使用基本split
将此向量拆分为每个长度为5的向量。 You could also use a variable to store this interval length. 您也可以使用变量存储此间隔长度。
Using rep
with each = 5
, and creating a sequence programmatically, gets you a sequence of the numbers 1, 2, ... up to the length divided by 5 (in this case, 4), each 5 times consecutively. 对each = 5
使用rep
并以编程方式创建一个序列,即可得到一个数字1、2,...的序列,其长度除以5(在这种情况下为4),并连续5次。 Then split
returns a list of vectors. 然后split
返回向量列表。
It's worth noting that a variety of SO posts will recommend you store similar data in lists such as this, rather than creating multiple variables, so I'm leaving it in list form here. 值得注意的是,各种各样的SO帖子都建议您将类似的数据存储在诸如此类的列表中,而不是创建多个变量,因此我将其保留在列表形式中。
Y <- 1:20
breaks <- rep(1:(length(Y) / 5), each = 5)
split(Y, breaks)
#> $`1`
#> [1] 1 2 3 4 5
#>
#> $`2`
#> [1] 6 7 8 9 10
#>
#> $`3`
#> [1] 11 12 13 14 15
#>
#> $`4`
#> [1] 16 17 18 19 20
Created on 2019-02-12 by the reprex package (v0.2.1) 由reprex软件包 (v0.2.1)创建于2019-02-12
We can make use of split
(writing the commented code as solution) to split
the vector
into a list
of vector
s. 我们可以利用split
(将注释代码编写为解决方案)将vector
split
为vector
s的list
。
lst <- split(Y, as.integer(gl(length(Y), 5, length(Y))))
lst
#$`1`
#[1] 1 2 3 4 5
#$`2`
#[1] 6 7 8 9 10
#$`3`
#[1] 11 12 13 14 15
#$`4`
#[1] 16 17 18 19 20
Here, the gl
create a grouping index by specifying the n
, k
and length
parameters where n
- an integer giving the number of levels, k
- an integer giving the number of replications, and length
-an integer giving the length of the result. 在这里, gl
通过指定n
, k
和length
参数来创建分组索引,其中n
给出级别k
的整数, k
给出重复数的整数,length-给出结果length
的整数。
In our case, we want to have 'k' as 5. 在我们的例子中,我们希望'k'为5。
as.integer(gl(length(Y), 5, length(Y)))
#[1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4
If we want to have multiple objects in the global environment, use list2env
如果要在全局环境中具有多个对象,请使用list2env
list2env(setNames(lst, paste0("Y", seq_along(lst))), envir = .GlobalEnv)
Y1
#[1] 1 2 3 4 5
Y2
#[1] 6 7 8 9 10
Y3
#[1] 11 12 13 14 15
Y4
#[1] 16 17 18 19 20
Or as the OP mentioned dplyr/tidyr
in the question, we can use those packages as well 或者正如OP在问题中提到的dplyr/tidyr
,我们也可以使用这些软件包
library(tidyverse)
tibble(Y) %>%
group_by(grp = (row_number()-1) %/% 5 + 1) %>%
summarise(Y = list(Y)) %>%
pull(Y)
#[[1]]
#[1] 1 2 3 4 5
#[[2]]
#[1] 6 7 8 9 10
#[[3]]
#[1] 11 12 13 14 15
#[[4]]
#[1] 16 17 18 19 20
Y <- 1:20
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.