Applying the klaR kmodes algorith to the below dataset
> summary(raw)
CREDIT_LIMIT CP gender IE_CHILD_NB IE_TOT_DEP_NB TOTAL_INCOME IE_HOUSE_CHARGE maritial
>2000 : 612 11500 : 145 MM: 5435 0:7432 0:1446 >2000 :3524 >2000 : 2 D : 1195
0-500 :10458 11100 : 90 MR:12983 1:4119 1:3748 0-500 :1503 0-500 :17146 M :10507
1000-1500: 2912 08830 : 71 2:5787 2:3386 1000-1500:6649 1000-1500: 44 MISS: 1446
1500-2000: 2254 11406 : 68 3: 947 3:3740 1500-2000:4116 1500-2000: 5 Ot : 1043
500-1000 : 2182 35018 : 66 4: 133 4:6098 500-1000 :2626 500-1000 : 1221 S : 4227
11510 : 62
(Other):17916
new_age job_age
>70 : 295 0-20 :14627
0-30 : 815 20-30: 1986
30-40:4867 30-40: 612
40-50:7293 40-50: 124
50-60:3883 50-60: 1069
60-70:1265
I get the following error
> cluster.results <-kmodes(data=raw, modes=4, iter.max = 10, weighted=FALSE )
Error: Column index must be at most 5 if positive, not 6
Any idea about what is the error about?
Bests
Partial answer for anyone searching about that error : the error means that somewhere an object is being called to return elements outside it's range, such as more columns than exist, eg:
> aa <- tibble(bb = c(1,2))
> aa
# A tibble: 2 x 1
bb
<dbl>
1 1.00
2 2.00
> aa[,2]
Error: Column index must be at most 1 if positive, not 2
I'm not sure of the source of the error exactly in this case, it doesn't occur with lists and data frames (dfs return undefined columns selected
, and lists return NULL
), and I don't use that package.
I experienced the same problem when trying to use kmodes to cluster the following cateforical dataframe:
> summary(raw_df)
Age Years_At_Present_Employment Marital_Status_Gender Dependents Housing Job
(0,20] : 80 A71: 310 A91: 250 1:4225 A151: 895 A171: 110
(20,30]:1975 A72: 860 A92:1550 2: 775 A152:3565 A172:1000
(30,45]:2015 A73:1695 A93:2740 A153: 540 A173:3150
(45,60]: 705 A74: 870 A94: 460 A174: 740
(60,75]: 225 A75:1265
Foreign_Worker Current_Address_Yrs Telephone
A201:4815 Min. :1.000 A191:2980
A202: 185 1st Qu.:2.000 A192:2020
Median :3.000
Mean :2.845
3rd Qu.:4.000
Max. :4.000
Then I got the error
> (raw_clusters <- klaR::kmodes(raw_df, 5))
Error: Column index must be at most 4 if positive, not 6
It seems that this implementation of kmodes (klaR) requires that the categorical variables need to be numerical, so you need to convert the variables from factors into numerical (keeping in mind that they are really categorical)
raw_4clust <- raw_df %>%
mutate(
Age = as.numeric(Age),
Years_At_Present_Employment = as.numeric(Years_At_Present_Employment),
Marital_Status_Gender = as.numeric(Marital_Status_Gender),
Housing = as.numeric(Housing),
Job = as.numeric(Job),
Foreign_Worker = as.numeric(Foreign_Worker),
Telephone = as.numeric(Telephone)
)
after that it worked for me.
Hope that helps
In my case, i have used dplyr for doing data transformation. so what I did was converting my object to data frame:
tmp = as.data.frame(tmp)
And my problem solved.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.