簡體   English   中英

在R中,將數據框對角線轉換為行

[英]In R, convert data frame diagonals to rows

我正在開發一個模型,預測一個年齡組的生育能力。 我目前有一個這樣的數據框,其中行是年齡,列是年。 每個細胞的價值是該年度的特定年齡生育率:

> df1
   iso3    sex age fert1953 fert1954 fert1955
14  AUS female  13    0.000  0.00000  0.00000
15  AUS female  14    0.000  0.00000  0.00000
16  AUS female  15   13.108 13.42733 13.74667
17  AUS female  16   26.216 26.85467 27.49333
18  AUS female  17   39.324 40.28200 41.24000

但是,我想要的是每一行都是一個隊列。 因為行和列表示各個年份,所以可以通過獲得對角線來獲得群組數據。 我正在尋找這樣的結果:

> df2
   iso3    sex ageIn1953 fert1953  fert1954  fert1955
14  AUS female        13    0.000   0.00000  13.74667
15  AUS female        14    0.000  13.42733  27.49333
16  AUS female        15   13.108  26.85467  41.24000
17  AUS female        16   26.216  40.28200  [data..] 
18  AUS female        17   39.324  [data..]  [data..] 

這是df1數據框:

df1 <- structure(list(iso3 = c("AUS", "AUS", "AUS", "AUS", "AUS"), sex = c("female", 
"female", "female", "female", "female"), age = c(13, 14, 15, 
16, 17), fert1953 = c(0, 0, 13.108, 26.216, 39.324), fert1954 = c(0, 
0, 13.4273333333333, 26.8546666666667, 40.282), fert1955 = c(0, 
0, 13.7466666666667, 27.4933333333333, 41.24)), .Names = c("iso3", 
"sex", "age", "fert1953", "fert1954", "fert1955"), class = "data.frame", row.names = 14:18)

編輯:

這是我最終使用的解決方案。 它基於David的答案,但我需要為iso3每個級別執行此iso3

df.ls <- lapply(split(f3, f = f3$iso3), FUN = function(df1) {
  n <- ncol(df1) - 4
  temp <- mapply(function(x, y) lead(x, n = y), df1[, -seq_len(4)], seq_len(n))
  return(cbind(df1[seq_len(4)], temp))
})
f4 <- do.call("rbind", df.ls)

我沒有測試的速度,但data.table v1.9.5 ,最近實施了一項新(用C語言編寫)超前/滯后函數調用的shift

因此,對於要移動的列,您可以將其與mapply結合使用,例如

library(data.table)
n <- ncol(df1) - 4 # the number of years - 1
temp <- mapply(function(x, y) shift(x, n = y, type = "lead"), df1[, -seq_len(4)], seq_len(n))
cbind(df1[seq_len(4)], temp) # combining back with the unchanged columns
#    iso3    sex age fert1953 fert1954 fert1955
# 14  AUS female  13    0.000  0.00000 13.74667
# 15  AUS female  14    0.000 13.42733 27.49333
# 16  AUS female  15   13.108 26.85467 41.24000
# 17  AUS female  16   26.216 40.28200       NA
# 18  AUS female  17   39.324       NA       NA

編輯:您可以使用GitHub輕松安裝data.table的開發版本

library(devtools) 
install_github("Rdatatable/data.table", build_vignettes = FALSE)

無論哪種方式,如果你想要dplyr ,這里就是

library(dplyr)
n <- ncol(df1) - 4 # the number of years - 1
temp <- mapply(function(x, y) lead(x, n = y), df1[, -seq_len(4)], seq_len(n))
cbind(df1[seq_len(4)], temp)
#    iso3    sex age fert1953 fert1954 fert1955
# 14  AUS female  13    0.000  0.00000 13.74667
# 15  AUS female  14    0.000 13.42733 27.49333
# 16  AUS female  15   13.108 26.85467 41.24000
# 17  AUS female  16   26.216 40.28200       NA
# 18  AUS female  17   39.324       NA       NA

這是一個基礎R方法:

df1[,5:ncol(df1)] <- mapply(function(x, y) {vec.list <- df1[-1:-y, x]
                       length(vec.list) <- nrow(df1)
                       vec.list},
                       x=5:ncol(df1), y=1:(ncol(df1)-4))
df1
#   iso3    sex age fert1953 fert1954 fert1955
#14  AUS female  13    0.000  0.00000 13.74667
#15  AUS female  14    0.000 13.42733 27.49333
#16  AUS female  15   13.108 26.85467 41.24000
#17  AUS female  16   26.216 40.28200       NA
#18  AUS female  17   39.324       NA       NA

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM