繁体   English   中英

计算熵

[英]to calculate the Entropy

我是R新手,无法计算熵。 关于答案,在stackoverflow上也有类似的问题,但我想知道为什么此代码无法正常工作。 这是来自同一问题的复制粘贴数据。

答案之一提到:“我认为您缺少的部分是班级频率的计算,您将得到答案”,但是我该如何解决。 我尝试了大多数选项,但仍然没有任何输出。 它只是运行而没有任何错误。

info <- function(CLASS.FREQ){
      freq.class <- CLASS.FREQ
      info <- 0
      for(i in 1:length(freq.class)){
        if(freq.class[[i]] != 0){ # zero check in class
          entropy <- -sum(freq.class[[i]] * log2(freq.class[[i]]))  #I calculate the entropy for each class i here
        }else{ 
          entropy <- 0
        } 
        info <- info + entropy # sum up entropy from all classes
      }
      return(info)
    }

数据集如下,

buys <- c("no", "no", "yes", "yes", "yes", "no", "yes", "no", "yes", "yes", "yes", "yes", "yes", "no")

credit <- c("fair", "excellent", "fair", "fair", "fair", "excellent", "excellent", "fair", "fair", "fair", "excellent", "excellent", "fair", "excellent")

student <- c("no", "no", "no","no", "yes", "yes", "yes", "no", "yes", "yes", "yes", "no", "yes", "no")

income <- c("high", "high", "high", "medium", "low", "low", "low", "medium", "low", "medium", "medium", "medium", "high", "medium")

age <- c(25, 27, 35, 41, 48, 42, 36, 29, 26, 45, 23, 33, 37, 44) 

我们将年龄从分类年龄更改为数字年龄

干杯,杰克

您需要计算“购买”中“否”和“是”的比例,在“信用”中计算“公平”和“优秀”的比例,依此类推。 这是一种实现方法:

data <- list(
  buys = c("no", "no", "yes", "yes", "yes", "no", "yes", "no", "yes", "yes", "yes", "yes", "yes", "no"),
  credit = c("fair", "excellent", "fair", "fair", "fair", "excellent", "excellent", "fair", "fair", "fair", "excellent", "excellent", "fair", "excellent"),
  student = c("no", "no", "no","no", "yes", "yes", "yes", "no", "yes", "yes", "yes", "no", "yes", "no"),
  income = c("high", "high", "high", "medium", "low", "low", "low", "medium", "low", "medium", "medium", "medium", "high", "medium"),
  age = c(25, 27, 35, 41, 48, 42, 36, 29, 26, 45, 23, 33, 37, 44) 
  )

freq <- lapply( data, function(x){rowMeans(outer(unique(x),x,"=="))})

> freq
$buys
[1] 0.3571429 0.6428571

$credit
[1] 0.5714286 0.4285714

$student
[1] 0.5 0.5

$income
[1] 0.2857143 0.4285714 0.2857143

$age
 [1] 0.07142857 0.07142857 0.07142857 0.07142857 0.07142857 0.07142857 0.07142857 0.07142857 0.07142857 0.07142857 0.07142857 0.07142857 0.07142857
[14] 0.07142857

这样的比例永远不能为0。因此,将if(freq.class[[i]] != 0){ # zero check in class更改为if(length(freq.class[[i]]) != 0){ # zero check in class

info <- function(CLASS.FREQ){
  freq.class <- CLASS.FREQ
  info <- 0
  for(i in 1:length(freq.class)){
    if(length(freq.class[[i]]) != 0){ # zero check in class
      entropy <- -sum(freq.class[[i]] * log2(freq.class[[i]]))  #I calculate the entropy for each class i here
    }else{ 
      entropy <- 0
    } 
    info <- info + entropy # sum up entropy from all classes
  }
  return(info)
}

> info(freq)
[1] 8.289526
> info(freq$buys)
[1] 0.940286
> info(freq$age)
[1] 3.807355
> 

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM