錯誤包KlaR kmodes：錯誤：如果正數，列索引必須最大為5，而不是6

Question

將klaR kmodes算法應用於以下數據集

> summary(raw)
    CREDIT_LIMIT         CP        gender     IE_CHILD_NB IE_TOT_DEP_NB    TOTAL_INCOME   IE_HOUSE_CHARGE  maritial    
 >2000    :  612   11500  :  145   MM: 5435   0:7432      0:1446        >2000    :3524   >2000    :    2   D   : 1195  
 0-500    :10458   11100  :   90   MR:12983   1:4119      1:3748        0-500    :1503   0-500    :17146   M   :10507  
 1000-1500: 2912   08830  :   71              2:5787      2:3386        1000-1500:6649   1000-1500:   44   MISS: 1446  
 1500-2000: 2254   11406  :   68              3: 947      3:3740        1500-2000:4116   1500-2000:    5   Ot  : 1043  
 500-1000 : 2182   35018  :   66              4: 133      4:6098        500-1000 :2626   500-1000 : 1221   S   : 4227  
                   11510  :   62                                                                                       
                   (Other):17916                                                                                       
  new_age      job_age     
 >70  : 295   0-20 :14627  
 0-30 : 815   20-30: 1986  
 30-40:4867   30-40:  612  
 40-50:7293   40-50:  124  
 50-60:3883   50-60: 1069  
 60-70:1265

我收到以下錯誤

> cluster.results <-kmodes(data=raw, modes=4, iter.max = 10, weighted=FALSE )
Error: Column index must be at most 5 if positive, not 6

關於錯誤的任何想法嗎？

最好的

Answer 1

對於搜索該錯誤的任何人的部分答案 ：錯誤表示某個對象被調用以返回其范圍之外的元素，例如，存在的列多於其他對象，例如：

> aa <- tibble(bb = c(1,2))
> aa
# A tibble: 2 x 1
     bb
  <dbl>
1  1.00
2  2.00
> aa[,2]
Error: Column index must be at most 1 if positive, not 2

在這種情況下，我不確定錯誤的根源，列表和數據框都不會發生此錯誤（dfs返回undefined columns selected ，列表返回NULL ），並且我不使用該包。

Answer 2

嘗試使用kmode將以下類別數據框聚類時，我遇到了相同的問題：

 > summary(raw_df)
  Age       Years_At_Present_Employment Marital_Status_Gender Dependents Housing       Job      
  (0,20] :  80   A71: 310                    A91: 250              1:4225     A151: 895   A171: 110  
  (20,30]:1975   A72: 860                    A92:1550              2: 775     A152:3565   A172:1000  
  (30,45]:2015   A73:1695                    A93:2740                         A153: 540   A173:3150  
  (45,60]: 705   A74: 870                    A94: 460                                     A174: 740  
  (60,75]: 225   A75:1265                                                                            

  Foreign_Worker Current_Address_Yrs Telephone  
  A201:4815      Min.   :1.000       A191:2980  
  A202: 185      1st Qu.:2.000       A192:2020  
                 Median :3.000                  
                 Mean   :2.845                  
                 3rd Qu.:4.000                  
                 Max.   :4.000

然后我得到了錯誤

 > (raw_clusters <- klaR::kmodes(raw_df, 5))
 Error: Column index must be at most 4 if positive, not 6

似乎kmodes（klaR）的這種實現要求分類變量必須是數字變量，因此您需要將變量從因子轉換為數字變量（請記住，它們確實是分類變量）

raw_4clust <- raw_df %>% 
                       mutate(
                          Age = as.numeric(Age),
                          Years_At_Present_Employment = as.numeric(Years_At_Present_Employment),
                          Marital_Status_Gender = as.numeric(Marital_Status_Gender),
                          Housing = as.numeric(Housing),
                          Job = as.numeric(Job),
                          Foreign_Worker = as.numeric(Foreign_Worker),
                          Telephone = as.numeric(Telephone)
                                   )

之后，它對我有用。

希望能有所幫助

Answer 3

就我而言，我已經使用dplyr進行數據轉換。 所以我所做的就是將對象轉換為數據框：

tmp = as.data.frame(tmp)

我的問題解決了。

錯誤包KlaR kmodes：錯誤：如果正數，列索引必須最大為5，而不是6

問題描述

3 個解決方案

解決方案1
0 2018-06-19 06:51:41

解決方案2
0 2018-10-25 11:21:37

解決方案3
0 2018-10-30 12:43:18

錯誤包KlaR kmodes：錯誤：如果正數，列索引必須最大為5，​​而不是6

問題描述

3 個解決方案

解決方案1 0 2018-06-19 06:51:41

解決方案2 0 2018-10-25 11:21:37

解決方案3 0 2018-10-30 12:43:18

錯誤包KlaR kmodes：錯誤：如果正數，列索引必須最大為5，而不是6

解決方案1
0 2018-06-19 06:51:41

解決方案2
0 2018-10-25 11:21:37

解決方案3
0 2018-10-30 12:43:18