python中的联合熵

Question

I have two arrays:我有两个数组：

import numpy as np
a = np.array(['1','2','3'])
b = np.array(['3','4','1','5'])

I want to calculate joint entropy.我想计算联合熵。 I've found some materials to make it like:我找到了一些材料来制作它：

import numpy as np
def entropy(*X):
    return = np.sum(-p * np.log2(p) if p > 0 else 0 
        for p in (np.mean(reduce(np.logical_and, (predictions == c for predictions, c in zip(X, classes))))
        for classes in itertools.product(*[set(x) for x in X])))

Seems to work fine with len(a) = len(b) but it ends with error if len(a) != len(b)使用len(a) = len(b)似乎工作正常，但如果len(a) != len(b)则以错误结束

UPD: Arrays a and b were created from exampled main input: UPD：数组a和b是从示例主输入创建的：

b:3 p1:1 p2:6 p5:7
b:4 p1:2 p7:2
b:1 p3:4 p5:8
b:5 p1:3 p4:4

Where array a was created from p1 values.其中数组a是从 p1 值创建的。 So not every line consists of every pK but every has b property.所以不是每条线都包含每个pK但每条线都有b属性。 I need to calculate mutual information I(b,pK) for each pK .我需要为每个pK计算互信息I(b,pK) 。

Answer 1

Assuming you are talking about the Joint Shannon Entropy , the formula straightforward:假设你在谈论联合香农熵，公式很简单：

在此处输入图片说明

The problem with this, when I look at what you've done so far, is that you lack P(x,y) , ie the joint probability of the two variables occurring together.问题在于，当我查看您到目前为止所做的工作时，您缺少P(x,y) ，即两个变量一起出现的联合概率。 It looks like a,b are the individual probabilities for events a and b respectively.看起来a,b分别是事件 a 和 b 的个体概率。

You have other problems with your posted code (mentioned in the comments):您发布的代码有其他问题（在评论中提到）：

Your variables are not a numeric data type a=["1","2"] is not the same as a=[1,2] .您的变量不是数字数据类型a=["1","2"]是不一样的a=[1,2] One is a string, the other is a number.一个是字符串，另一个是数字。
The length of your input data must be the same, ie for every x in A, there must be ay in B AND you need to you know the joint probability P(x,y) .您的输入数据的长度必须相同，即对于 A 中的每个 x，B 中必须有 y并且您需要知道联合概率P(x,y) 。

Answer 2

Here is an idea:这是一个想法：

convert data to numbers将数据转换为数字
add padding example zeros添加填充示例零

import numpy as np
from scipy import stats

a = np.array(['1','2','3','0'])
b = np.array(['3','4','1','5'])
aa = [int(x) for x in a]
bb = [int(x) for x in b]
je =  stats.entropy(aa,bb)
print("joint entropy : ",je)

output: 0.9083449242695364输出：0.9083449242695364

python中的联合熵

问题描述

2 个解决方案

解决方案1
2 2013-09-16 14:13:12

解决方案2
-1 2020-08-09 15:59:45

python中的联合熵

问题描述

2 个解决方案

解决方案1 2 2013-09-16 14:13:12

解决方案2 -1 2020-08-09 15:59:45

解决方案1
2 2013-09-16 14:13:12

解决方案2
-1 2020-08-09 15:59:45