I have a time-series data set. The data is available in Excel format at here . I would like to cluster the data using k-means. However, I have got an error.
**Please note that FinDat
is my data from the attached sources.
> head(FinDat)
# A tibble: 6 x 10
date ISE...2 ISE...3 SP DAX FTSE NIKKEI BOVESPA EU
<dttm> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2009-01-05 00:00:00 0.0358 0.0384 -0.00468 0.00219 3.89e-3 0 0.0312 0.0127
2 2009-01-06 00:00:00 0.0254 0.0318 0.00779 0.00846 1.29e-2 0.00416 0.0189 0.0113
3 2009-01-07 00:00:00 -0.0289 -0.0264 -0.0305 -0.0178 -2.87e-2 0.0173 -0.0359 -0.0171
4 2009-01-08 00:00:00 -0.0622 -0.0847 0.00339 -0.0117 -4.66e-4 -0.0401 0.0283 -0.00556
5 2009-01-09 00:00:00 0.00986 0.00966 -0.0215 -0.0199 -1.27e-2 -0.00447 -0.00976 -0.0110
6 2009-01-12 00:00:00 -0.0292 -0.0424 -0.0228 -0.0135 -5.03e-3 -0.0490 -0.0538 -0.0125
# ... with 1 more variable: EM <dbl>
silhouette_score <- function(k){
km <- kmeans(FinDat, centers = k, nstart=25)
ss <- silhouette(km$cluster, dist(FinDat))
mean(ss[, 3])
}
k <- 2:10
avg_sil <- sapply(k, silhouette_score)
which returns:
Error in do_one(nmeth) : NA/NaN/Inf in foreign function call (arg 1)
In addition: Warning message:
In storage.mode(x) <- "double" : NAs introduced by coercion
Seems kmeans
doesn't like the date column, you may want to exclude it
library(cluster)
silhouette_score <- function(k) {
stopifnot(!k > nrow(FinDat) - 1)
km <- kmeans(FinDat[-1], centers=k, nstart=25)
ss <- silhouette(km$cluster, dist(FinDat[-1]))
return(setNames(mean(ss[, 3]), k))
}
k <- 2:5
avg_sil <- sapply(k, silhouette_score)
avg_sil
# 2 3 4 5
# 0.3791762 0.3302388 0.2735529 0.2133566
Or convert all columns to numeric using data.matrix
.
silhouette_score2 <- function(k) {
stopifnot(!k > nrow(FinDat) - 1)
FinDat <- data.matrix(FinDat)
km <- kmeans(FinDat, centers=k, nstart=25)
ss <- silhouette(km$cluster, dist(FinDat))
return(setNames(mean(ss[, 3]), k))
}
k <- 2:5
avg_sil <- sapply(k, silhouette_score2)
avg_sil
# 2 3 4 5
# 0.40783229 0.37777778 0.21111111 0.08333333
Data:
FinDat <- structure(list(date = structure(c(1231110000, 1231196400, 1231282800,
1231369200, 1231455600, 1231714800), class = c("POSIXct", "POSIXt"
), tzone = ""), ISE...2 = c(0.0358, 0.0254, -0.0289, -0.0622,
0.00986, -0.0292), ISE...3 = c(0.0384, 0.0318, -0.0264, -0.0847,
0.00966, -0.0424), SP = c(-0.00468, 0.00779, -0.0305, 0.00339,
-0.0215, -0.0228), DAX = c(0.00219, 0.00846, -0.0178, -0.0117,
-0.0199, -0.0135), FTSE = c(0.00389, 0.0129, -0.0287, -0.000466,
-0.0127, -0.00503), NIKKEI = c(0, 0.00416, 0.0173, -0.0401, -0.00447,
-0.049), BOVESPA = c(0.0312, 0.0189, -0.0359, 0.0283, -0.00976,
-0.0538), EU = c(0.0127, 0.0113, -0.0171, -0.00556, -0.011, -0.0125
)), row.names = c("1", "2", "3", "4", "5", "6"), class = "data.frame")
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.