简体   繁体   English

kmeans 为我在 R 中的时间序列数据集返回错误

[英]kmeans returns an error for my time-series data sets in R

I have a time-series data set.我有一个时间序列数据集。 The data is available in Excel format at here . 此处提供 Excel 格式的数据。 I would like to cluster the data using k-means.我想使用 k-means 对数据进行聚类。 However, I have got an error.但是,我有一个错误。

**Please note that FinDat is my data from the attached sources. **请注意, FinDat是我来自所附来源的数据。

  > head(FinDat)
# A tibble: 6 x 10
  date                 ISE...2  ISE...3       SP      DAX     FTSE   NIKKEI  BOVESPA       EU
  <dttm>                 <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>
1 2009-01-05 00:00:00  0.0358   0.0384  -0.00468  0.00219  3.89e-3  0        0.0312   0.0127 
2 2009-01-06 00:00:00  0.0254   0.0318   0.00779  0.00846  1.29e-2  0.00416  0.0189   0.0113 
3 2009-01-07 00:00:00 -0.0289  -0.0264  -0.0305  -0.0178  -2.87e-2  0.0173  -0.0359  -0.0171 
4 2009-01-08 00:00:00 -0.0622  -0.0847   0.00339 -0.0117  -4.66e-4 -0.0401   0.0283  -0.00556
5 2009-01-09 00:00:00  0.00986  0.00966 -0.0215  -0.0199  -1.27e-2 -0.00447 -0.00976 -0.0110 
6 2009-01-12 00:00:00 -0.0292  -0.0424  -0.0228  -0.0135  -5.03e-3 -0.0490  -0.0538  -0.0125 
# ... with 1 more variable: EM <dbl>

silhouette_score <- function(k){
  km <- kmeans(FinDat, centers = k, nstart=25)
  ss <- silhouette(km$cluster, dist(FinDat))
  mean(ss[, 3])
}
k <- 2:10
avg_sil <- sapply(k, silhouette_score)

which returns:

        Error in do_one(nmeth) : NA/NaN/Inf in foreign function call (arg 1)
    In addition: Warning message:
    In storage.mode(x) <- "double" : NAs introduced by coercion

Seems kmeans doesn't like the date column, you may want to exclude it似乎kmeans不喜欢日期列,您可能想排除它

library(cluster)
silhouette_score <- function(k) {
  stopifnot(!k > nrow(FinDat) - 1)
  km <- kmeans(FinDat[-1], centers=k, nstart=25)
  ss <- silhouette(km$cluster, dist(FinDat[-1]))
  return(setNames(mean(ss[, 3]), k))
}

k <- 2:5
avg_sil <- sapply(k, silhouette_score)
avg_sil
#         2         3         4         5 
# 0.3791762 0.3302388 0.2735529 0.2133566 

Or convert all columns to numeric using data.matrix .或者使用data.matrix将所有列转换为数字。

silhouette_score2 <- function(k) {
  stopifnot(!k > nrow(FinDat) - 1)
  FinDat <- data.matrix(FinDat)
  km <- kmeans(FinDat, centers=k, nstart=25)
  ss <- silhouette(km$cluster, dist(FinDat))
  return(setNames(mean(ss[, 3]), k))
}

k <- 2:5
avg_sil <- sapply(k, silhouette_score2)
avg_sil
#          2          3          4          5 
# 0.40783229 0.37777778 0.21111111 0.08333333

Data:数据:

FinDat <- structure(list(date = structure(c(1231110000, 1231196400, 1231282800, 
1231369200, 1231455600, 1231714800), class = c("POSIXct", "POSIXt"
), tzone = ""), ISE...2 = c(0.0358, 0.0254, -0.0289, -0.0622, 
0.00986, -0.0292), ISE...3 = c(0.0384, 0.0318, -0.0264, -0.0847, 
0.00966, -0.0424), SP = c(-0.00468, 0.00779, -0.0305, 0.00339, 
-0.0215, -0.0228), DAX = c(0.00219, 0.00846, -0.0178, -0.0117, 
-0.0199, -0.0135), FTSE = c(0.00389, 0.0129, -0.0287, -0.000466, 
-0.0127, -0.00503), NIKKEI = c(0, 0.00416, 0.0173, -0.0401, -0.00447, 
-0.049), BOVESPA = c(0.0312, 0.0189, -0.0359, 0.0283, -0.00976, 
-0.0538), EU = c(0.0127, 0.0113, -0.0171, -0.00556, -0.011, -0.0125
)), row.names = c("1", "2", "3", "4", "5", "6"), class = "data.frame")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM