[英]Extracting columns from dataframes in a list to produce a final, combined dataframe
抱歉,過了一會兒回到R
lists
和dataframes
,所以忘記了我的方式。 假設我在列表中有幾個數據框:
d2<- data.frame(week=c("12th","13th","14th"),value=c(1,20,100))
d1<- data.frame(week=c("12th","13th","14th"),value=c(1,10,15))
d3<- data.frame(week=c("12th","13th","14th"),value=c(1,220,30))
dfList<- list(d1,d2,d3)
dfList
[[1]]
week value
1 12th 1
2 13th 10
3 14th 15
[[2]]
week value
1 12th 1
2 13th 20
3 14th 100
[[3]]
week value
1 12th 1
2 13th 220
3 14th 30
我想要一個包含合並數據的最終數據框,其形狀如下
finalDf<- data.frame(week=c("12th","13th","14th"),value1=c(1,20,100),value2=c(1,10,15),value3=c(1,220,30))
week value1 value2 value3
1 12th 1 1 1
2 13th 20 10 220
3 14th 100 15 30
如何獲得上述數據形式? 此外,如果我的初始數據幀也具有NA,我想在獲得最終數據形式之前將其刪除怎么辦?
非常感謝。
我看到了cbind策略,但是如果缺少值,它們可能會失敗,因此我認為應該說明合並方法:
Reduce( function(x,y) merge(x, y, by="week"), dfList)
week value.x value.y value
1 12th 1 1 1
2 13th 10 20 220
3 14th 15 100 30
如果要保留所有可能的NA值,則可能需要添加, all.x=TRUE
參數。
> cbind(dfList[[1]], lapply(dfList[2:3], `[`, "value"))
week value value value
1 12th 1 1 1
2 13th 10 20 220
3 14th 15 100 30
看來數據框的編號在您的數據設置和預期結果中是不同的,但是此代碼的任何版本(更改[
和[[
)中的相關提取索引)都會使您進入預期的結構。
您可以嘗試:
library(plyr)
join_all(dfList, by="week")
# week value value value
#1 12th 1 1 1
#2 13th 10 20 220
#3 14th 15 100 30
此外,還可以使用@Frank的數據與NA
使用
res <- join_all(dfList, by="week")
res
# week value value value
#1 12th 1 NA 1
#2 13th 10 NA 220
#3 14th 15 NA NA
str(res)
#'data.frame': 3 obs. of 4 variables:
# $ week : Factor w/ 3 levels "12th","13th",..: 1 2 3
#$ value: num 1 10 15
#$ value: logi NA NA NA
#$ value: num 1 220 NA ##numeric columns
已經有一些非常好的答案,但這是另一個:
步驟1:將您的data.frames合並成一個長列表:
dfDF <- do.call(rbind, dfList)
步驟2:添加一個“時間”變量,該變量指示數據來自的列表。 有幾種方法可以做到這一點。
with(dfDF, ave(as.character(week), week, FUN = seq_along))
# [1] "1" "1" "1" "2" "2" "2" "3" "3" "3"
rep(sequence(length(dfList)), vapply(dfList, nrow, 1L))
# [1] 1 1 1 2 2 2 3 3 3
dfDF$time <- with(dfDF, ave(as.character(week), week, FUN = seq_along))
步驟3:使用dcast
從“長”到“寬”。
library(reshape2)
dcast(dfDF, week ~ time, value.var = "value")
# week 1 2 3
# 1 12th 1 1 1
# 2 13th 10 20 220
# 3 14th 15 100 30
df <- data.frame(t(unique(t(do.call(cbind, dfList)))), stringsAsFactors = FALSE)
df
# week value value.1 value.2
#1 12th 1 1 1
#2 13th 10 20 220
#3 14th 15 100 30
如果希望value
列為數字而不是字符:
df[2:4] <- sapply(df[2:4], as.numeric)
df
# week value value.1 value.2
#1 12th 1 1 1
#2 13th 10 20 220
#3 14th 15 100 30
此外,還可以與NA一起使用:
d2<- data.frame(week=c("12th","13th","14th"),value=c(NA,NA,NA))
d1<- data.frame(week=c("12th","13th","14th"),value=c(1,10,15))
d3<- data.frame(week=c("12th","13th","14th"),value=c(1,220,NA))
dfList<- list(d1,d2,d3)
df <- data.frame(t(unique(t(do.call(cbind, dfList)))), stringsAsFactors = FALSE)
df
# week value value.1 value.2
#1 12th 1 <NA> 1
#2 13th 10 <NA> 220
#3 14th 15 <NA> <NA>
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.