[英]How to calculate fuzzy performance index and normalized classification entropy in R
我正在使用e1071
包运行模糊 C 均值聚类。 我想根据以下公式中给出的模糊性能指数(FPI)(模糊程度)和归一化分类熵(NCE)(特定类别的混乱程度)来确定最佳聚类数
其中 c 是聚类数,n 是观察数,μ ik是模糊隶属度,log a是自然对数。
我正在使用以下代码
library(e1071)
x <- rbind(matrix(rnorm(100,sd=0.3),ncol=2),
matrix(rnorm(100,mean=1,sd=0.3),ncol=2))
cl <- cmeans(x,2,20,verbose=TRUE,method="cmeans")
cl$membership
我已经能够提取 μ ik即模糊隶属度。 现在, cmeans
必须针对不同数量的集群,例如 2 到 6,并且必须计算 FPI 和 NCE 以获得如下图
如何在 R 中实现?
编辑
我已经使用以下代码尝试了@nya 为iris
数据集提供的代码
df <- scale(iris[-5])
FPI <- function(cmem){
c <- ncol(cmem)
n <- nrow(cmem)
1 - (c / (c - 1)) * (1 - sum(cmem^2) / n)
}
NCE <- function(cmem){
c <- ncol(cmem)
n <- nrow(cmem)
(n / (n - c)) * (- sum(cmem * log(cmem)) / n)
}
# prepare variables
cl <- list()
fpi <- nce <- NULL
# cycle through the desired number of clusters
for(i in 2:6){
cl[[i]] <- cmeans(df, i, 20, method = "cmeans")
fpi <- c(fpi, FPI(cl[[i]]$membership))
nce <- c(nce, NCE(cl[[i]]$membership))
}
# add space for the second axis label
par(mar = c(5,4,1,4) + .1)
# plot FPI
plot(2:6, fpi, lty = 2, pch = 18, type = "b", xlab = "Number of clusters", ylab = "FPI")
# plot NCE, manually adding the second axis
par(new = TRUE)
plot(2:6, nce, lty = 1, pch = 15, type = "b", xlab = "", ylab = "", axes = FALSE)
axis(4, at = pretty(range(nce)))
mtext("NCE", side = 4, line = 3)
# add legend
legend("top", legend = c("FPI", "NCE"), pch = c(18,15), lty = c(2,1), horiz = TRUE)
考虑模糊性能指数(FPI)和归一化分类熵(NCE)的最小值来决定最佳聚类数。 NCE 一直在增加,而 FPI 显示出减少的值。 理想情况下应该是
有了可用的方程,我们可以编写自己的函数。 在这里,这两个函数使用您建议的论文和作者引用的参考文献之一中存在的方程。
FPI <- function(cmem, method = c("McBrathney", "Rahul")){
method = match.arg(method)
C <- ncol(cmem)
N <- nrow(cmem)
# Rahul et al. 2019. https://doi.org/10.1080/03650340.2019.1578345
if(method == "Rahul"){
res <- 1 - (C / (C - 1)) * (1 - sum(cmem^2) / N)
}
# McBrathney & Moore 1985 https://doi.org/10.1016/0168-1923(85)90082-6
if(method == "McBrathney"){
F <- sum(cmem^2) / N
res <- 1 - (C * F - 1) / (F - 1)
}
return(res)
}
NCE <- function(cmem, method = c("McBrathney", "Rahul")){
method = match.arg(method)
C <- ncol(cmem)
N <- nrow(cmem)
if(method == "Rahul"){
res <- (n / (n - c)) * (- sum(cmem * log(cmem)) / n)
}
if(method == "McBrathney"){
H <- - sum(cmem * log(cmem)) / N
res <- H / log(C)
}
return(res)
}
然后使用这些来计算来自iris
数据集的cmeans
函数的隶属度的指数。
# prepare variables
cl <- list()
fpi <- nce <- NULL
# cycle through the desired number of clusters
for(i in 2:6){
cl[[i]] <- cmeans(iris[, -5], i, 20, method = "cmeans")
fpi <- c(fpi, FPI(cl[[i]]$membership, method = "M"))
nce <- c(nce, NCE(cl[[i]]$membership, method = "M"))
}
最后,在一个图中使用两个不同的轴进行绘图。
# add space for the second axis label
par(mar = c(5,4,1,4) + .1)
# plot FPI
plot(2:6, fpi, lty = 2, pch = 18, type = "b", xlab = "Number of clusters", ylab = "FPI")
# plot NCE, manually adding the second axis
par(new = TRUE)
plot(2:6, nce, lty = 1, pch = 15, type = "b", xlab = "", ylab = "", axes = FALSE)
axis(4, at = pretty(range(nce)))
mtext("NCE", side = 4, line = 3)
# add legend
legend("top", legend = c("FPI", "NCE"), pch = c(18,15), lty = c(2,1), horiz = TRUE)
编辑:根据来自两个不同出版物的可选方程更新函数,并在iris
数据集上计算示例。
希望这可以帮助
library(dplyr)
library(ggplot2)
f <- function(cl) {
C <- length(cl$size)
N <- sum(cl$size)
mu <- cl$membership
fpi <- 1 - C / (C - 1) * (1 - sum((mu)^2) / N)
nce <- N / (N - C) * (-sum(log(mu) * mu) / N)
c(FPI = fpi, NCE = nce)
}
data.frame(t(rbind(
K = 2:6,
sapply(
K,
function(k) f(cmeans(x, k, 20, verbose = TRUE, method = "cmeans"))
)
))) %>%
pivot_longer(cols = FPI:NCE, names_to = "Index") %>%
ggplot(aes(x = K, y = value, group = Index)) +
geom_line(aes(linetype = Index, color = Index)) +
geom_point() +
scale_y_continuous(
name = "FPI",
sec.axis = sec_axis(~., name = "NCE")
) +
theme(legend.position = "top")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.