[英]Reshaping wide to long with multiple varying variables and unbalaned time
我有兩組不同的變量:年度百分比和年份。 年度百分比從1999年開始到2012年結束,但是從1999年到2013年開始。
countrylabel annualpercentageshare.1999 year1990 year1991 year1992
1 Austria NA NA NA NA
2 Belgium NA NA NA NA
3 Bulgaria 48.20000 NA NA NA
4 Estonia NA NA NA NA
5 France 47.52853 NA NA NA
6 Germany NA NA NA NA
像這樣的東西。
我已經嘗試過這段代碼:
merge_data2 <- reshape(merge_data2, varying = list(2:ncol(merge_data2)),
v.names = c("percentageshare", "Year"),
idvar = "countrylabel", direction = "long", times = 1990:2013)
但我收到此錯誤消息:
“reshapeLong中的錯誤(數據,idvar = idvar,timevar = timevar,變化=變化,:'length(變化)'必須全部匹配'length(times)'”
編輯:我想要一個這樣的數據幀:
countrylabel time annualpercentageshare year
Austria 1990 NA NA
Austria 1991 NA NA
library(tidyr); library(dplyr)
df %>%
gather(variable, value, -countrylabel) %>%
separate("variable", into = c("stat", "time"), sep = -4) %>%
spread(stat, value)
產量
countrylabel time annualpercentageshare. year
1 Austria 1990 NA NA
2 Austria 1991 NA NA
3 Austria 1992 NA NA
4 Austria 1999 NA NA
5 Belgium 1990 NA NA
6 Belgium 1991 NA NA
7 Belgium 1992 NA NA
8 Belgium 1999 NA NA
9 Bulgaria 1990 NA NA
10 Bulgaria 1991 NA NA
11 Bulgaria 1992 NA NA
12 Bulgaria 1999 48.20000 NA
13 Estonia 1990 NA NA
14 Estonia 1991 NA NA
15 Estonia 1992 NA NA
16 Estonia 1999 NA NA
17 France 1990 NA NA
18 France 1991 NA NA
19 France 1992 NA NA
20 France 1999 47.52853 NA
21 Germany 1990 NA NA
22 Germany 1991 NA NA
23 Germany 1992 NA NA
24 Germany 1999 NA NA
reshape
喜歡"."
,所以我們首先在year*
變量中插入一個。
names(d) <- gsub("year", "year.", names(d))
現在,在我們reshape
了缺少的列和order
,
d$annualpercentage.2002 <- NA
d$year.1999 <- NA
d <- d[c(1, order(names(d)[-1]) + 1)]
你的想法的作品通過定義不同的列排序varying
中的列表:
res <- reshape(d, varying=list(2:5, 6:9), direction="long", idvar="countrylabel",
times=1999:2002, v.names=c("annualpercentage", "year"))
res
# countrylabel time annualpercentage year
# Austria.1999 Austria 1999 NA NA
# Belgium.1999 Belgium 1999 NA NA
# Bulgaria.1999 Bulgaria 1999 -0.6806495 NA
# Estonia.1999 Estonia 1999 NA NA
# France.1999 France 1999 NA NA
# Germany.1999 Germany 1999 NA NA
# Switzerland.1999 Switzerland 1999 -1.8497570 NA
# Austria.2000 Austria 2000 -0.6033900 0.14970015
# Belgium.2000 Belgium 2000 NA -0.49201756
# Bulgaria.2000 Bulgaria 2000 0.8263925 -0.36320990
# Estonia.2000 Estonia 2000 NA -2.51032544
# France.2000 France 2000 NA 0.57800624
# Germany.2000 Germany 2000 NA -0.52295712
# Switzerland.2000 Switzerland 2000 0.2783076 0.25616728
# Austria.2001 Austria 2001 -2.6962484 -0.15375642
# Belgium.2001 Belgium 2001 1.3088577 0.72528621
# Bulgaria.2001 Bulgaria 2001 NA NA
# Estonia.2001 Estonia 2001 NA -0.05563662
# France.2001 France 2001 0.2224629 0.74205086
# Germany.2001 Germany 2001 NA -0.01185349
# Switzerland.2001 Switzerland 2001 0.8354322 -1.40826638
# Austria.2002 Austria 2002 NA NA
# Belgium.2002 Belgium 2002 NA 1.60874778
# Bulgaria.2002 Bulgaria 2002 NA NA
# Estonia.2002 Estonia 2002 NA 0.55866704
# France.2002 France 2002 NA -1.59866472
# Germany.2002 Germany 2002 NA -0.11217415
# Switzerland.2002 Switzerland 2002 NA NA
數據
d <- structure(list(countrylabel = c("Austria", "Belgium", "Bulgaria",
"Estonia", "France", "Germany", "Switzerland"), annualpercentage.1999 = c(NA,
-2.58060150400384, -0.0623757258909573, 0.267776001395166, NA,
NA, 0.048219924249952), annualpercentage.2000 = c(NA, -0.249416955035044,
1.3525450891501, 1.04446768824697, NA, -0.0582347596434839, -0.891400228849837
), annualpercentage.2001 = c(1.82469277697851, NA, NA, 1.04231605324821,
NA, -0.900145118946308, -1.19320727433597), year2000 = c(0.633712375393134,
NA, 1.24760861316098, -0.092964787061478, -0.59403260962332,
NA, -0.650348234181285), year2001 = c(0.587318286831079, NA,
NA, 0.348890470222513, NA, NA, NA), year2002 = c(0.0645316087966406,
-0.279456557428068, NA, NA, -0.0627400036074545, 1.30419117694731,
-0.484654596062051)), row.names = c(NA, -7L), class = "data.frame")
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.