簡體   English   中英

從列表中的數據框提取列以生成最終的組合數據框

[英]Extracting columns from dataframes in a list to produce a final, combined dataframe

抱歉,過了一會兒回到R listsdataframes ,所以忘記了我的方式。 假設我在列表中有幾個數據框:

d2<- data.frame(week=c("12th","13th","14th"),value=c(1,20,100))
d1<- data.frame(week=c("12th","13th","14th"),value=c(1,10,15))
d3<- data.frame(week=c("12th","13th","14th"),value=c(1,220,30))
dfList<- list(d1,d2,d3)

dfList
[[1]]
  week value
1 12th     1
2 13th    10
3 14th    15

[[2]]
  week value
1 12th     1
2 13th    20
3 14th   100

[[3]]
  week value
1 12th     1
2 13th   220
3 14th    30

我想要一個包含合並數據的最終數據框,其形狀如下

finalDf<- data.frame(week=c("12th","13th","14th"),value1=c(1,20,100),value2=c(1,10,15),value3=c(1,220,30))

  week value1 value2 value3
1 12th      1      1      1
2 13th     20     10    220
3 14th    100     15     30

如何獲得上述數據形式? 此外,如果我的初始數據幀也具有NA,我想在獲得最終數據形式之前將其刪除怎么辦?

非常感謝。

我看到了cbind策略,但是如果缺少值,它們可能會失敗,因此我認為應該說明合並方法:

 Reduce( function(x,y) merge(x, y, by="week"), dfList)
  week value.x value.y value
1 12th       1       1     1
2 13th      10      20   220
3 14th      15     100    30

如果要保留所有可能的NA值,則可能需要添加, all.x=TRUE參數。

> cbind(dfList[[1]], lapply(dfList[2:3], `[`, "value"))
  week value value value
1 12th     1     1     1
2 13th    10    20   220
3 14th    15   100    30

看來數據框的編號在您的數據設置和預期結果中是不同的,但是此代碼的任何版本(更改[[[ )中的相關提取索引)都會使您進入預期的結構。

您可以嘗試:

 library(plyr)
 join_all(dfList, by="week")
 #  week value value value
 #1 12th     1     1     1
 #2 13th    10    20   220
 #3 14th    15   100    30

此外,還可以使用@Frank的數據與NA使用

  res <- join_all(dfList, by="week")
  res
  #  week value value value
  #1 12th     1    NA     1
  #2 13th    10    NA   220
  #3 14th    15    NA    NA

  str(res)
 #'data.frame': 3 obs. of  4 variables:
 # $ week : Factor w/ 3 levels "12th","13th",..: 1 2 3
 #$ value: num  1 10 15
 #$ value: logi  NA NA NA
 #$ value: num  1 220 NA  ##numeric columns

已經有一些非常好的答案,但這是另一個:

步驟1:將您的data.frames合並成一個長列表:

dfDF <- do.call(rbind, dfList)

步驟2:添加一個“時間”變量,該變量指示數據來自的列表。 有幾種方法可以做到這一點。

with(dfDF, ave(as.character(week), week, FUN = seq_along))
# [1] "1" "1" "1" "2" "2" "2" "3" "3" "3"

rep(sequence(length(dfList)), vapply(dfList, nrow, 1L))
# [1] 1 1 1 2 2 2 3 3 3

dfDF$time <- with(dfDF, ave(as.character(week), week, FUN = seq_along))

步驟3:使用dcast從“長”到“寬”。

library(reshape2)
dcast(dfDF, week ~ time, value.var = "value")
#   week  1   2   3
# 1 12th  1   1   1
# 2 13th 10  20 220
# 3 14th 15 100  30
df <- data.frame(t(unique(t(do.call(cbind, dfList)))), stringsAsFactors = FALSE)
df
#  week value value.1 value.2
#1 12th     1       1       1
#2 13th    10      20     220
#3 14th    15     100      30

如果希望value列為數字而不是字符:

df[2:4] <- sapply(df[2:4], as.numeric)
df
#  week value value.1 value.2
#1 12th     1       1       1
#2 13th    10      20     220
#3 14th    15     100      30

此外,還可以與NA一起使用:

d2<- data.frame(week=c("12th","13th","14th"),value=c(NA,NA,NA))
d1<- data.frame(week=c("12th","13th","14th"),value=c(1,10,15))
d3<- data.frame(week=c("12th","13th","14th"),value=c(1,220,NA))
dfList<- list(d1,d2,d3)

df <- data.frame(t(unique(t(do.call(cbind, dfList)))), stringsAsFactors = FALSE)
df
#  week value value.1 value.2
#1 12th     1    <NA>       1
#2 13th    10    <NA>     220
#3 14th    15    <NA>    <NA>

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM