简体   繁体   中英

Seasonal adjustment of multiple series ignoring NA and zero columns

I have a dataframe with 105 months and 20 columns. The example below is simplified and shows that some of the columns start at January 2014 and some don't. Some others are zeroed:

df <- data.frame(months = c('2014-01-01','2014-02-01',
                    '2014-03-01','2014-04-01','2014-05-01',
                    '2014-06-01','2014-07-01'),
            series2 = c(1754,3345,12226,1712,6703,8172,1545),
            series3 = c(NA,NA,NA,NA,554,222,321)
            series4 = c(NA,NA,NA,NA,0,0,0)
            )

My objective is to seasonally adjust the series which can be seasonally adjusted and write a similar dataframe, keeping the seasonally adjusted series in the same order and position as in the original dataframe .

I have made a for loop to decide which columns can be seasonally adjusted. The for loop also finds out the initial date of every column.

library(seasonal)

# determine initial and final date in the first column of dataframe

initial_date <- as.POSIXct(pull(df[1,1]),format = "%Y-%m-%d")
final_date <- as.POSIXct(pull(dados0[nrow(df),1]),format = "%Y-%m-%d")

# create an empty dataframe to be completed with seasonally adjusted
dataseas_adj_df<-data.frame(matrix(ncol = ncol(df), nrow = nrow(df)))

# decide which series should be seasonally adjusted

for(i in 2:ncol(df)) {                # Head of for-loop
    # if a certain column contains only zeros...
    if(sum(df[,i] != 0, na.rm=TRUE)==0) {   
        seas_adj_df[,i]<-as.numeric(NA)       #fill the column with NA} 
    else {
    #determine the number of values of the column
        n_values_column<-length(df[,i][!is.na(df[,i])])
    #how many months after the beginning of the dataframe did the column start?
        months_to_add<-nrow(df)-n_values_column
    #calculates the initial date of the column
        column_initial_date<-initial_date %m+% months(months_to_add)
    #transform the column values into a time series
       time_series <- ts(df[,i],start = c(year(column_initial_date),
                    month(column_initial_date)),
            end = c(year(final_date), month(final_date)),
            freq = 12)
    #perform seasonal adjustment
        time_series_sa<- final(seas(time_series, multimode = "R"))
    #insert seasonally adjusted series into the new dataframe
        seas_adj_df[,i]<-time_series_sa   #this part is wrong
}}

However, i receive the folowing error:

Error in \[\<-.data.frame(*tmp*, , i, value = c(928.211662624947, 993.311013042665,  : replacement has 81 rows, data has 105

This happens because some of my columns have 81 values (the rest are filled with "NA").

My two questions are:

  1. Is there a way to seasonally adjust all series, but asking R to "jump" NA columns and 0 columns ? My final dataframe must have the seasonally ajusted series in the exact same position as the original ones (example: series5_SA must be in column 5, even if series4 couldn't be seasonally adjusted).

  2. Using my code (or a similar code), how could I add a time series with 81 values into a dataframe with 105 rows? How can i tell R that the column should be inserted from line (105-81=24) on ?

You can use lapply in conjunction with an error handling function (like tryCatch ), instead of a for loop. This type of functions will try to perform an operation. But, if they encounter and error, they will provide another result as indicated (like the original ts not sa). The order of the time series will not be afected. Here is an example with the AirPassengers data set:

> library(seasonal)
> library(lubridate)
> 
> data(AirPassengers)
> 
> df <- replicate(5, AirPassengers)
> df <- cbind.data.frame(date_decimal(as.numeric(time(AirPassengers))), df)
> 
> ## Adding NA to second and fourth data columns
> df[sample(1:nrow(df), 10), 3] <- NA  
> df[sample(1:nrow(df), 10), 5] <- NA 
> 
> initial_date <- as.Date(df[1,1], format = "%Y-%m-%d")
> 
> time_series <- lapply(df[, -1], function(x){
+   ts(x, start = c(year(initial_date), month(initial_date)), frequency = 12)
+ })
> 
> time_series_sa <- lapply(time_series, function(x) {
+   tryCatch(final(seas(x, multimode = "R")), error = function(e) {x})
+ })
> 
> summary(time_series_sa)
  Length Class Mode   
1 144    ts    numeric
2 144    ts    numeric
3 144    ts    numeric
4 144    ts    numeric
5 144    ts    numeric

Hope it helps.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM