简体   繁体   English

在R中将多行转换为单列

[英]Convert multiple rows into single column in R

Using R, I have a large data frame, of which the following is an example: 使用R,我有一个大的数据框,其中以下是一个例子:

df = data.frame(X1 = c("02JAN2008","09:30 - 10:00", "10:00 - 10:30", "10:30 - 11:00","11:00 - 11:30", "15:30 - 16:00", "16:00 - 16:30", "03JAN2008",  "09:30 - 10:00", "10:00 - 10:30", "10:30 - 11:00", "11:00 - 11:30"),X2 = c(NA, 1469.37, 1459.91, 1456.92, 1453.48, 1447.22, 1447.16,NA, 1449.78, 1451.21, 1450.08, 1452.16),X3 = c(NA, 1467.97, 1467.11, 1459.76, 1457.00, 1444.00, 1447.67,NA, 1447.55, 1450.66, 1452.06, 1450.01))

which looks like: 看起来像:

              X1      X2      X3
1      02JAN2008      NA      NA
2  09:30 - 10:00 1469.37 1467.97
3  10:00 - 10:30 1459.91 1467.11
4  10:30 - 11:00 1456.92 1459.76
5  11:00 - 11:30 1453.48 1457.00
6  15:30 - 16:00 1447.22 1444.00
7  16:00 - 16:30 1447.16 1447.67
8      03JAN2008      NA      NA
9  09:30 - 10:00 1449.78 1447.55
10 10:00 - 10:30 1451.21 1450.66
11 10:30 - 11:00 1450.08 1452.06
12 11:00 - 11:30 1452.16 1450.01

Due to missing data, for some days there might be 6 observations, but for others there might only be 4 (or less, this is an example). 由于缺少数据,有些日子可能会有6个观测值,但对于其他天数,可能只有4个(或更少,这是一个例子)。

I would like to transform this into a data frame with the date as a separate column for each 30 minute interval, such as: 我想将其转换为数据框,每30分钟间隔将日期作为单独的列,例如:

          X1            X2      X3      X4
1  02JAN2008 09:30 - 10:00 1469.37 1467.97
2  02JAN2008 10:00 - 10:30 1459.91 1467.11
3  02JAN2008 10:30 - 11:00 1456.92 1459.76
4  02JAN2008 11:00 - 11:30 1453.48 1457.00
5  02JAN2008 15:30 - 16:00 1447.22 1444.00
6  02JAN2008 16:00 - 16:30 1447.16 1447.67
7  03JAN2008 09:30 - 10:00 1449.78 1447.55
8  03JAN2008 10:00 - 10:30 1451.21 1450.66
9  03JAN2008 10:30 - 11:00 1450.08 1452.06
10 03JAN2008 11:00 - 11:30 1452.16 1450.01

I could easily grab the indexes of df where X2 is NA and then write a for loop that carries forward the date, but I would like to avoid a for loop in R. 我可以很容易地抓取df的索引,其中X2是NA然后写一个for循环来结转日期,但是我想避免在R中使用for循环。

How can I do this in R? 我怎么能在R中这样做? Surely a dplyr or tidyr solution is available, but I can't produce one from the examples on the documentation. 当然可以使用dplyrtidyr解决方案,但我无法从文档中的示例中生成一个解决方案。 Or perhaps some version of melt ? 或许某些版本的melt

Here's an option: 这是一个选项:

library(data.table)
dt = as.data.table(df) # or setDT to convert in place

dt[, grp := cumsum(is.na(X2))][, c(date = list(X1[1]), tail(.SD, -1)), by = grp]
#    grp      date            X1      X2      X3
# 1:   1 02JAN2008 09:30 - 10:00 1469.37 1467.97
# 2:   1 02JAN2008 10:00 - 10:30 1459.91 1467.11
# 3:   1 02JAN2008 10:30 - 11:00 1456.92 1459.76
# 4:   1 02JAN2008 11:00 - 11:30 1453.48 1457.00
# 5:   1 02JAN2008 15:30 - 16:00 1447.22 1444.00
# 6:   1 02JAN2008 16:00 - 16:30 1447.16 1447.67
# 7:   2 03JAN2008 09:30 - 10:00 1449.78 1447.55
# 8:   2 03JAN2008 10:00 - 10:30 1451.21 1450.66
# 9:   2 03JAN2008 10:30 - 11:00 1450.08 1452.06
#10:   2 03JAN2008 11:00 - 11:30 1452.16 1450.01

Here's a dplyr way: 这是一个dplyr方式:

breaks <- is.na(df$X2)
df %>%
    mutate(date=X1[breaks][cumsum(breaks)]) %>%
    filter(!breaks)

#               X1   X2   X3      date
# 1  09:30 - 10:00 1469 1468 02JAN2008
# 2  10:00 - 10:30 1460 1467 02JAN2008
# 3  10:30 - 11:00 1457 1460 02JAN2008
# 4  11:00 - 11:30 1453 1457 02JAN2008
# 5  15:30 - 16:00 1447 1444 02JAN2008
# 6  16:00 - 16:30 1447 1448 02JAN2008
# 7  09:30 - 10:00 1450 1448 03JAN2008
# 8  10:00 - 10:30 1451 1451 03JAN2008
# 9  10:30 - 11:00 1450 1452 03JAN2008
# 10 11:00 - 11:30 1452 1450 03JAN2008

Or just as simply in base R: 或者就像基地R一样简单:

df <- within(df, date <- X1[breaks][cumsum(breaks)])
df[! breaks, ]

One way is with na.locf from zoo : 一种方法是来自zoo na.locf

require(zoo)
df0<-cbind(df$X1,df)
df0[!is.na(df0[,3]),1]<-NA
df0[,1]<-na.locf(df0[,1])
df0<-df0[!is.na(df0[,3]),]

Which gives: 这使:

> df0    
       df$X1            X1      X2      X3
2  02JAN2008 09:30 - 10:00 1469.37 1467.97
3  02JAN2008 10:00 - 10:30 1459.91 1467.11
4  02JAN2008 10:30 - 11:00 1456.92 1459.76
5  02JAN2008 11:00 - 11:30 1453.48 1457.00
6  02JAN2008 15:30 - 16:00 1447.22 1444.00
7  02JAN2008 16:00 - 16:30 1447.16 1447.67
9  03JAN2008 09:30 - 10:00 1449.78 1447.55
10 03JAN2008 10:00 - 10:30 1451.21 1450.66
11 03JAN2008 10:30 - 11:00 1450.08 1452.06
12 03JAN2008 11:00 - 11:30 1452.16 1450.01

A base R option would be base R选项将是

df$X1 <- as.character(df$X1)
indx <- !grepl(':', df$X1)
res <- setNames(data.frame(unlist(tapply(df$X1[indx][cumsum(indx)], 
          cumsum(indx), FUN=head, -1)), df[!indx,]), paste0("X",1:4))
row.names(res) <- NULL
res
#          X1            X2      X3      X4
#1  02JAN2008 09:30 - 10:00 1469.37 1467.97
#2  02JAN2008 10:00 - 10:30 1459.91 1467.11
#3  02JAN2008 10:30 - 11:00 1456.92 1459.76
#4  02JAN2008 11:00 - 11:30 1453.48 1457.00
#5  02JAN2008 15:30 - 16:00 1447.22 1444.00
#6  02JAN2008 16:00 - 16:30 1447.16 1447.67
#7  03JAN2008 09:30 - 10:00 1449.78 1447.55
#8  03JAN2008 10:00 - 10:30 1451.21 1450.66
#9  03JAN2008 10:30 - 11:00 1450.08 1452.06
#10 03JAN2008 11:00 - 11:30 1452.16 1450.01

Or 要么

res2 <- do.call(rbind,lapply(Map(cbind, df$X1[indx],split(df[!indx,], 
           cumsum(indx)[!indx])), setNames, paste0('X', 1:4)))
row.names(res2) <- NULL

I have tried this: 我试过这个:

> na_ind <- which(is.na(df$X2))
> day_break <- c(na_ind, nrow(df) + 1)
> day_count <- day_break[-1] - day_break[-length(day_break)] -1
> day_count
## [1] 6 4
> new_df <- cbind(date = rep(df$X1[na_ind], times = day_count),
+                 df[-na_ind,])
> new_df
## date            X1      X2      X3
## 2  02JAN2008 09:30 - 10:00 1469.37 1467.97
## 3  02JAN2008 10:00 - 10:30 1459.91 1467.11
## 4  02JAN2008 10:30 - 11:00 1456.92 1459.76
## 5  02JAN2008 11:00 - 11:30 1453.48 1457.00
## 6  02JAN2008 15:30 - 16:00 1447.22 1444.00
## 7  02JAN2008 16:00 - 16:30 1447.16 1447.67
## 9  03JAN2008 09:30 - 10:00 1449.78 1447.55
## 10 03JAN2008 10:00 - 10:30 1451.21 1450.66
## 11 03JAN2008 10:30 - 11:00 1450.08 1452.06
## 12 03JAN2008 11:00 - 11:30 1452.16 1450.01

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM