[英]Error calculating distances with CLARA function in R
I am creating a market segmentation of consumers by clustering in 3 categories. 我通过将商品分为3类来创建消费者的市场细分。 I am using the cluster CRAN package with the
CLARA
clustering algorithm. 我正在使用带有
CLARA
群集算法的群集CRAN软件包。
The data has 12901 observations with 34 variables taking on ordinal
and NA
values. 数据包含12901个观测值,其中34个变量采用
ordinal
和NA
值。
The ordinal
values do not have the same increments between categories. 类别之间的
ordinal
值没有相同的增量。 For example, in the HouseholdIncome column, the categories are "0-15k", "15k-25k", "25k-35k", "35k-50k", "50k-75k", "75k-100k", "100k-125k", "125k-150k", "150k-175k", "175k-200k", "200k-250k", "250k+". 例如,在HouseholdIncome列中,类别为“ 0-15k”,“ 15k-25k”,“ 25k-35k”,“ 35k-50k”,“ 50k-75k”,“ 75k-100k”,“ 100k- 125k”,“ 125k-150k”,“ 150k-175k”,“ 175k-200k”,“ 200k-250k”,“ 250k +”。
Every row has a least 1 observation. 每行至少有1个观察值。
> which(rowSums(is.na(Store2df))==ncol(Store2df))
named integer(0)
Here's the first five observations of first seven variables. 这是前七个变量的前五个观察结果。
> head(Store2df, n=5)
Age Gender HouseholdIncome MaritalStatus PresenceofChildren HomeOwnerStatus HomeMarketValue
1 <NA> Male <NA> <NA> <NA> <NA> <NA>
2 45-54 Female <NA> <NA> <NA> <NA> <NA>
5 45-54 Female 75k-100k Married Yes Own 150k-200k
6 25-34 Male 75k-100k Married No Own 300k-350k
7 35-44 Female 125k-150k Married Yes Own 250k-300k
Here's the code for the clara function: 这是clara函数的代码:
> library(cluster)
> #Clara algorithm
> #Set seed for reproducibility
> set.seed(1)
> #Changing medoids.x and keep.data = TRUE - new way
> client2.clara <- clara(Store2df, 3, metric = "manhattan", stand = FALSE, samples = 5,
+ sampsize = (2500), medoids.x = TRUE, keep.data = TRUE,
+ rngR = TRUE, pamLike = TRUE)
#Error in clara(Store2df, 3, metric = "manhattan", stand = FALSE, samples = 5, :
#Each of the random samples contains objects between which no distance can be computed.
Please let me know if I can provide more information. 如果可以提供更多信息,请告诉我。
Source code for CLARA: CLARA的源代码:
ndyst = as.integer(if(metric == "manhattan") 2 else 1),
Each of the random samples contains objects between which no distance can be computed.
每个随机样本都包含无法计算距离的对象。
Take this error message seriously... 认真对待此错误消息...
metric = "manhattan"
is not defined for categorial variables. 没有为分类变量定义。
Manhattan and Euclidean distances operate on numeric vectors (which also should be linearly scaled, and eg not angles or logscaled). 曼哈顿距离和欧几里得距离基于数字矢量(也应线性缩放,而不是角度或对数缩放)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.