[英]R Time Series Gap Fill for plotting with type = 'b'
我有一個包含四列的.csv文件(NAME,ID,YEAR,VALUE,見下面的示例),並且想要使用plot('YEAR','VALUE',type ='b')做一些時間序列圖。 由於缺少時間序列中YEARS之間的某些數據,我想在年份之間編寫包含NA值的新列,這樣我就可以在YEAR間隙中繪制沒有連接線的數據(在我的示例中:填寫NA值為1984年至1987年的BARTLEY項目)。
有沒有辦法做到這一點?? 任何幫助都非常感謝! 謝謝!
我的.csv文件如下所示:
NAME; ID; YEAR; VALUE
NAME1; 885; 1988; -2
NAME1; 885; 1989; 0
NAME2; 2665; 1999; 4
NAME2; 2665; 2000; 8
NAME2; 2665; 2001; 19
NAME2; 2665; 2002; 13
NAME2; 2665; 2003; 13
NAME3; 893 ; 1983; 0
NAME3; 893 ; 1988; 2
NAME3; 893 ; 1989; -1
NAME4; 877 ; 1972; -1
NAME5; 894 ; 1973; -3
你可以使用sep =“;”讀取你顯示的文件。 在read.csv中標識單獨的值。 您可能會考慮類似下面的代碼來讀取數據,修復日期,添加NA以及繪制圖表。 我將您的數據放在一個名為“plot_test.txt”的文件中,以便read.csv從那里獲取數據。 另外,根據您對BARTLEY項目的評論,我假設您要為繪圖中的每個項目分隔線條。
# read data file into xx and change character years to Date values
xx <- read.csv("plot_test.txt",header=TRUE,sep=";")
xx$YEAR <- as.Date(paste(as.character(xx$YEAR),"-01-01",sep=""))
# create df as a template for all years and names
date_seq <- seq(min(xx$YEAR),max(xx$YEAR),by="12 month")
df <- merge(data.frame(NAME=unique(xx$NAME),ID=unique(xx$ID)),data.frame(YEAR=date_seq,VALUE=NA),all=TRUE)
# create unique names in xx and df to merge on
xx$NAME_YR <- paste(xx$NAME,xx$YEAR,sep="")
df$NAME_YR <- paste(df$NAME,df$YEAR,sep="")
# merge keeping only real data columns and restore original column names
xy <- merge(xx, df,by="NAME_YR",all=TRUE)[,c("NAME.y","ID.y","YEAR.y","VALUE.x")]
names(xy) <- names(xx)[1:4]
# plot xy using ggplot
library(ggplot2)
sp <- ggplot(data=xy, aes(x=YEAR, y=VALUE, colour=NAME)) + geom_point() + geom_line()
plot(sp)
很高興聽到你弄清楚了。 我仍然想知道每頁的情節數量。 我在代碼中添加了幾行,允許您設置在一個頁面上顯示的繪圖行數和列數,然后根據需要循環遍歷多個繪圖頁面。 我還添加了一些ggplot的東西來改變繪圖文本的外觀。
# read data file into xx and change character years to Date values
xx <- read.csv("plot_test.txt",header=TRUE,sep=";")
xx$YEAR <- as.Date(paste(as.character(xx$YEAR),"-01-01",sep=""))
xx$NAME_YR <- paste(xx$NAME,xx$YEAR,sep="")
# create Year template for years between min and max years for each NAME
xxmin <- as.Date(tapply(xx$YEAR, xx$NAME, min ), origin="1970-01-01")
xxmax <- as.Date(tapply(xx$YEAR, xx$NAME, max ), origin="1970-01-01")
xxdates <- mapply(seq, xxmin, xxmax, by="12 month")
xxyears <- data.frame(NAME=rep(names(xxdates), sapply(xxdates, length)),
YEAR=as.Date(unlist (xxdates),origin="1970-01-01"))
xxyears$NAME_YR <- paste(xxyears$NAME,xxyears$YEAR,sep="")
# merge template and data and assign colnames to ploting data
xy <- merge(xx, xxyears, by="NAME_YR", all=TRUE)[,c("NAME.y","ID","YEAR.y","VALUE")]
names(xy) <- c("NAME","ID","YEAR","VALUE")
# plot each NAME in a separate chart with own time axis
library(ggplot2)
rows_pg <- 2 # number of rows of plots per page
cols_pg <- 2 # number of columns of plots per page
chts_pg <- rows_pg*cols_pg
num_plots <- nlevels(xy$NAME)
# set plot axis labels and main titles and set values for text
spttl <- ggtitle("Your plot title\nSecond line of your plot title")
spaxlb <- labs ( x="Year", y="Data Values")
spth <- theme(plot.title=element_text(size=16, face="bold", colour="blue") )
spth <- spth + theme(axis.title.x= element_text(size=14, colour="blue") )
spth <- spth + theme(axis.title.y = element_text(size=14, colour="blue") )
spth <- spth + theme(axis.text = element_text(size=14, colour="black") )
spth <- spth + theme(strip.text = element_text(size=14, colour="brown"))
# generate plots
for( iplt in seq(1,num_plots, chts_pg) ) {
sp <- ggplot(data=xy[xy$NAME %in% levels(xy$NAME)[iplt:(iplt+chts_pg-1)], ],
aes(x=YEAR, y=VALUE)) + geom_line() + geom_point()
sp <- sp + facet_wrap(~ NAME, scales="free_x", nrow=rows_pg, ncol=cols_pg)
plot(sp + spttl + spaxlb + spth)
}
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.