简体   繁体   中英

Error Package KlaR kmodes : Error: Column index must be at most 5 if positive, not 6

Applying the klaR kmodes algorith to the below dataset

> summary(raw)
    CREDIT_LIMIT         CP        gender     IE_CHILD_NB IE_TOT_DEP_NB    TOTAL_INCOME   IE_HOUSE_CHARGE  maritial    
 >2000    :  612   11500  :  145   MM: 5435   0:7432      0:1446        >2000    :3524   >2000    :    2   D   : 1195  
 0-500    :10458   11100  :   90   MR:12983   1:4119      1:3748        0-500    :1503   0-500    :17146   M   :10507  
 1000-1500: 2912   08830  :   71              2:5787      2:3386        1000-1500:6649   1000-1500:   44   MISS: 1446  
 1500-2000: 2254   11406  :   68              3: 947      3:3740        1500-2000:4116   1500-2000:    5   Ot  : 1043  
 500-1000 : 2182   35018  :   66              4: 133      4:6098        500-1000 :2626   500-1000 : 1221   S   : 4227  
                   11510  :   62                                                                                       
                   (Other):17916                                                                                       
  new_age      job_age     
 >70  : 295   0-20 :14627  
 0-30 : 815   20-30: 1986  
 30-40:4867   30-40:  612  
 40-50:7293   40-50:  124  
 50-60:3883   50-60: 1069  
 60-70:1265              

I get the following error

> cluster.results <-kmodes(data=raw, modes=4, iter.max = 10, weighted=FALSE )
Error: Column index must be at most 5 if positive, not 6

Any idea about what is the error about?

Bests

Partial answer for anyone searching about that error : the error means that somewhere an object is being called to return elements outside it's range, such as more columns than exist, eg:

> aa <- tibble(bb = c(1,2))
> aa
# A tibble: 2 x 1
     bb
  <dbl>
1  1.00
2  2.00
> aa[,2]
Error: Column index must be at most 1 if positive, not 2

I'm not sure of the source of the error exactly in this case, it doesn't occur with lists and data frames (dfs return undefined columns selected , and lists return NULL ), and I don't use that package.

I experienced the same problem when trying to use kmodes to cluster the following cateforical dataframe:

 > summary(raw_df)
  Age       Years_At_Present_Employment Marital_Status_Gender Dependents Housing       Job      
  (0,20] :  80   A71: 310                    A91: 250              1:4225     A151: 895   A171: 110  
  (20,30]:1975   A72: 860                    A92:1550              2: 775     A152:3565   A172:1000  
  (30,45]:2015   A73:1695                    A93:2740                         A153: 540   A173:3150  
  (45,60]: 705   A74: 870                    A94: 460                                     A174: 740  
  (60,75]: 225   A75:1265                                                                            

  Foreign_Worker Current_Address_Yrs Telephone  
  A201:4815      Min.   :1.000       A191:2980  
  A202: 185      1st Qu.:2.000       A192:2020  
                 Median :3.000                  
                 Mean   :2.845                  
                 3rd Qu.:4.000                  
                 Max.   :4.000  

Then I got the error

 > (raw_clusters <- klaR::kmodes(raw_df, 5))
 Error: Column index must be at most 4 if positive, not 6

It seems that this implementation of kmodes (klaR) requires that the categorical variables need to be numerical, so you need to convert the variables from factors into numerical (keeping in mind that they are really categorical)

raw_4clust <- raw_df %>% 
                       mutate(
                          Age = as.numeric(Age),
                          Years_At_Present_Employment = as.numeric(Years_At_Present_Employment),
                          Marital_Status_Gender = as.numeric(Marital_Status_Gender),
                          Housing = as.numeric(Housing),
                          Job = as.numeric(Job),
                          Foreign_Worker = as.numeric(Foreign_Worker),
                          Telephone = as.numeric(Telephone)
                                   )

after that it worked for me.

Hope that helps

In my case, i have used dplyr for doing data transformation. so what I did was converting my object to data frame:

tmp = as.data.frame(tmp)

And my problem solved.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM