I'm working on a university project forecasting. I have a huge database with demand between two cities. However, I know that this dataset is contaminated. However, I do not know which data points are obscured. The dataset is a panel data set that follows demand between city pairs on a monthly basis. Below is a part of the data that I am working with.
CAI.JED CAI.RUH ADD.DXB CAI.IST ALG.IST
2013-01-01 19196 14777 16 1413 12
2013-02-01 19913 8 18203 1026 5
2013-03-01 34242 11751 17836 985 1
2013-04-01 23481 12000 13479 948 27
2013-05-01 24428 16046 16391 954 9
2013-06-01 31791 23479 16571 1 4
2013-07-01 33716 20090 11323 0 5724
2013-08-01 35553 2 11121 0 0
2013-09-01 18746 13423 12119 0 26
2013-10-01 10 12223 10239 0 0
2013-11-01 19 20234 14231 5 2
2013-12-01 15198 1 12132 10 5
The dataset is a combination from two datasets. The persons that provided me the data told me that in some months, only one of the two dataset is working. However, it is not known for which months, which specific dataset is available.
Now comes my question: for the next part of the project, I need to get annual demand numbers. However, as I know that the figures are obscured, I would like to remove outliers. What techniques are available in R to do this?
As the data is in time-series format, I tried to use the tsoutliers package (see http://cran.r-project.org/web/packages/tsoutliers/tsoutliers.pdf ). However, I could not get this working. Also, I tried the suggestions from https://stats.stackexchange.com/questions/104882/detecting-outliers-in-time-series-ls-ao-tc-using-tsoutliers-package-in-r-how/104946#104946 , but it didn't work.
After knowing what the outliers are, I would like to either replace them (eg with the mean for that route), or if too many points are missing, I would like to reject the entire route from the dataset.
I prefer density based clustering algorithm such as DBSCAN. If you modify the epsilon and num-samples, you can filter outliers very specifically using a plot to visualize the result (label -1 are the outliers)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.