简体   繁体   English

在R中使用CLARA函数计算距离时出错

[英]Error calculating distances with CLARA function in R

I am creating a market segmentation of consumers by clustering in 3 categories. 我通过将商品分为3类来创建消费者的市场细分。 I am using the cluster CRAN package with the CLARA clustering algorithm. 我正在使用带有CLARA群集算法的群集CRAN软件包。

The data has 12901 observations with 34 variables taking on ordinal and NA values. 数据包含12901个观测值,其中34个变量采用ordinalNA值。

The ordinal values do not have the same increments between categories. 类别之间的ordinal值没有相同的增量。 For example, in the HouseholdIncome column, the categories are "0-15k", "15k-25k", "25k-35k", "35k-50k", "50k-75k", "75k-100k", "100k-125k", "125k-150k", "150k-175k", "175k-200k", "200k-250k", "250k+". 例如,在HouseholdIncome列中,类别为“ 0-15k”,“ 15k-25k”,“ 25k-35k”,“ 35k-50k”,“ 50k-75k”,“ 75k-100k”,“ 100k- 125k”,“ 125k-150k”,“ 150k-175k”,“ 175k-200k”,“ 200k-250k”,“ 250k +”。

Every row has a least 1 observation. 每行至少有1个观察值。

> which(rowSums(is.na(Store2df))==ncol(Store2df))
named integer(0)

Here's the first five observations of first seven variables. 这是前七个变量的前五个观察结果。

> head(Store2df, n=5)
    Age Gender HouseholdIncome MaritalStatus PresenceofChildren HomeOwnerStatus HomeMarketValue
1  <NA>   Male            <NA>          <NA>               <NA>            <NA>            <NA>
2 45-54 Female            <NA>          <NA>               <NA>            <NA>            <NA>
5 45-54 Female        75k-100k       Married                Yes             Own       150k-200k
6 25-34   Male        75k-100k       Married                 No             Own       300k-350k
7 35-44 Female       125k-150k       Married                Yes             Own       250k-300k

Here's the code for the clara function: 这是clara函数的代码:

> library(cluster)
> #Clara algorithm
> #Set seed for reproducibility
> set.seed(1)
> #Changing medoids.x and keep.data = TRUE - new way 
> client2.clara <- clara(Store2df, 3, metric = "manhattan", stand = FALSE, samples = 5,
+                        sampsize = (2500), medoids.x = TRUE, keep.data = TRUE, 
+                        rngR = TRUE, pamLike = TRUE)
#Error in clara(Store2df, 3, metric = "manhattan", stand = FALSE, samples = 5,  : 
  #Each of the random samples contains objects between which no distance can be computed.

Please let me know if I can provide more information. 如果可以提供更多信息,请告诉我。

Source code for CLARA: CLARA的源代码:

ndyst = as.integer(if(metric == "manhattan") 2 else 1),

Each of the random samples contains objects between which no distance can be computed. 每个随机样本都包含无法计算距离的对象。

Take this error message seriously... 认真对待此错误消息...

metric = "manhattan"

is not defined for categorial variables. 没有为分类变量定义。

Manhattan and Euclidean distances operate on numeric vectors (which also should be linearly scaled, and eg not angles or logscaled). 曼哈顿距离和欧几里得距离基于数字矢量(也应线性缩放,而不是角度或对数缩放)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 根据R中的经纬度坐标计算距离 - Calculating distances from latitude and longitude coordinates in R 应用过滤器后,发布R中的计算距离-包括负值 - Issue calculating distances in R - including negative values - after applying filter 使用坐标计算距首都的距离 [R] - Calculating distances from capital city using coordinances [R] 将数据帧转换为“dist”类的对象,而无需实际计算 R 中的距离 - Convert a dataframe to an object of class "dist" without actually calculating distances in R 计算R中不同组观测的成对欧氏距离? - Calculating pairwise euclidean distances for observations from different groups in R? R:大数据的区别? 计算两个矩阵之间的最小距离 - R: Distm for big data? Calculating minimum distances between two matrices 确定R中“ MCA”对象的结构并计算2d坐标距离 - Determining the structure of an 'MCA' object in R and calculating 2d coordinate distances 在R中有效地计算一个点和一组点之间的所有距离 - Calculating all distances between one point and a group of points efficiently in R 使用for回路和georoute计算行进距离时跳过错误 - Skip the error when calculating travel distances using for loop and georoute R:ggmap 与“mapdist”函数翻转输出距离 - R: ggmap with "mapdist" function flipping output distances
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM