[英]In R, convert data frame diagonals to rows
我正在開發一個模型,預測一個年齡組的生育能力。 我目前有一個這樣的數據框,其中行是年齡,列是年。 每個細胞的價值是該年度的特定年齡生育率:
> df1
iso3 sex age fert1953 fert1954 fert1955
14 AUS female 13 0.000 0.00000 0.00000
15 AUS female 14 0.000 0.00000 0.00000
16 AUS female 15 13.108 13.42733 13.74667
17 AUS female 16 26.216 26.85467 27.49333
18 AUS female 17 39.324 40.28200 41.24000
但是,我想要的是每一行都是一個隊列。 因為行和列表示各個年份,所以可以通過獲得對角線來獲得群組數據。 我正在尋找這樣的結果:
> df2
iso3 sex ageIn1953 fert1953 fert1954 fert1955
14 AUS female 13 0.000 0.00000 13.74667
15 AUS female 14 0.000 13.42733 27.49333
16 AUS female 15 13.108 26.85467 41.24000
17 AUS female 16 26.216 40.28200 [data..]
18 AUS female 17 39.324 [data..] [data..]
這是df1
數據框:
df1 <- structure(list(iso3 = c("AUS", "AUS", "AUS", "AUS", "AUS"), sex = c("female",
"female", "female", "female", "female"), age = c(13, 14, 15,
16, 17), fert1953 = c(0, 0, 13.108, 26.216, 39.324), fert1954 = c(0,
0, 13.4273333333333, 26.8546666666667, 40.282), fert1955 = c(0,
0, 13.7466666666667, 27.4933333333333, 41.24)), .Names = c("iso3",
"sex", "age", "fert1953", "fert1954", "fert1955"), class = "data.frame", row.names = 14:18)
編輯:
這是我最終使用的解決方案。 它基於David的答案,但我需要為iso3
每個級別執行此iso3
。
df.ls <- lapply(split(f3, f = f3$iso3), FUN = function(df1) {
n <- ncol(df1) - 4
temp <- mapply(function(x, y) lead(x, n = y), df1[, -seq_len(4)], seq_len(n))
return(cbind(df1[seq_len(4)], temp))
})
f4 <- do.call("rbind", df.ls)
我沒有測試的速度,但data.table
v1.9.5 ,最近實施了一項新(用C語言編寫)超前/滯后函數調用的shift
因此,對於要移動的列,您可以將其與mapply
結合使用,例如
library(data.table)
n <- ncol(df1) - 4 # the number of years - 1
temp <- mapply(function(x, y) shift(x, n = y, type = "lead"), df1[, -seq_len(4)], seq_len(n))
cbind(df1[seq_len(4)], temp) # combining back with the unchanged columns
# iso3 sex age fert1953 fert1954 fert1955
# 14 AUS female 13 0.000 0.00000 13.74667
# 15 AUS female 14 0.000 13.42733 27.49333
# 16 AUS female 15 13.108 26.85467 41.24000
# 17 AUS female 16 26.216 40.28200 NA
# 18 AUS female 17 39.324 NA NA
編輯:您可以使用GitHub輕松安裝data.table
的開發版本
library(devtools)
install_github("Rdatatable/data.table", build_vignettes = FALSE)
無論哪種方式,如果你想要dplyr
,這里就是
library(dplyr)
n <- ncol(df1) - 4 # the number of years - 1
temp <- mapply(function(x, y) lead(x, n = y), df1[, -seq_len(4)], seq_len(n))
cbind(df1[seq_len(4)], temp)
# iso3 sex age fert1953 fert1954 fert1955
# 14 AUS female 13 0.000 0.00000 13.74667
# 15 AUS female 14 0.000 13.42733 27.49333
# 16 AUS female 15 13.108 26.85467 41.24000
# 17 AUS female 16 26.216 40.28200 NA
# 18 AUS female 17 39.324 NA NA
這是一個基礎R方法:
df1[,5:ncol(df1)] <- mapply(function(x, y) {vec.list <- df1[-1:-y, x]
length(vec.list) <- nrow(df1)
vec.list},
x=5:ncol(df1), y=1:(ncol(df1)-4))
df1
# iso3 sex age fert1953 fert1954 fert1955
#14 AUS female 13 0.000 0.00000 13.74667
#15 AUS female 14 0.000 13.42733 27.49333
#16 AUS female 15 13.108 26.85467 41.24000
#17 AUS female 16 26.216 40.28200 NA
#18 AUS female 17 39.324 NA NA
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.