簡體   English   中英

將特定數量的行轉換為R中的列,並對大型數據集重復該過程

[英]Convert specific number of rows to columns in R and repeat the process for a large dataset

我有一個單列的1500萬行的數據集。 看起來像,

x_raw
A1
A2
A3
A4
B1
B2
B3
B4
C1
C2

I want to convert it to

A1 A2 A3 A4
B1 B2 B3 B4
C1 C2 C3 C4

我正在嘗試使用“ for”循環,該循環將每4行轉置一次,並將它們添加到“最終”數據幀中,但是由於數據集太大,它將迭代近270萬次,但效率不高。 還有其他任何方法或我可以用來有效執行的任何方法嗎?

下面是一個選項tidyverse其中separate的“x_raw”分為兩列,然后spread到“寬”格式

library(dplyr)
library(tidyr)
separate(df1, x_raw, into = c('x', 'rn'), sep="(?=\\d+)", remove = FALSE) %>%
       spread(rn, x_raw) %>% 
       select(-x)
#   1  2    3    4
#1 A1 A2   A3   A4
#2 B1 B2   B3   B4
#3 C1 C2 <NA> <NA>

或者,如果元素數始終為4,那么我們也可以

as.data.frame(matrix(df1$x_raw, ncol =4, byrow = TRUE), stringsAsFactors=FALSE)

如果您只想轉換為四列數據框:

as.data.frame(matrix(df$x_raw,ncol=4,byrow = T))

看到這個

x_raw <- c("A1","A2","A3","A4","B1","B2","B3","B4","C1","C2","C3","C4","D1","D2","D3","D4")
x <- as.table(matrix(x_raw,ncol=4,byrow = T))
rownames(x) <- NULL
colnames(x) <- NULL
print(x)

它返回:

     [,1] [,2] [,3] [,4]
[1,] A1   A2   A3   A4
[2,] B1   B2   B3   B4  
[3,] C1   C2   C3   C4 
[4,] D1   D2   D3   D4

將長度擴展到下一個包含4個值的塊,並將其放入矩陣中:

matrix(`length<-`(dat$x_raw, (nrow(dat) %/% 4 + 1) * 4), ncol=4, byrow=TRUE)

#     [,1] [,2] [,3] [,4]
#[1,] "A1" "A2" "A3" "A4"
#[2,] "B1" "B2" "B3" "B4"
#[3,] "C1" "C2" NA   NA

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM