I have a dataframe of values, say:
df = pd.DataFrame(np.array([[0.2, 0.5, 0.3], [0.1, 0.2, 0.5], [0.4, 0.3, 0.3]]),
columns=['a', 'b', 'c'])
in which every row is a vector of probabilities. I want to compute something like the correlation matrix of df.corr()
, but instead of correlation, I want to compute the relative entropy .
What is the best way to do this, as I can't find a way to get inside the .corr()
method and simply change the function it uses?
IIUC, use .corr
as follows:
import pandas as pd
import numpy as np
from scipy.stats import entropy
df = pd.DataFrame(np.array([[0.2, 0.5, 0.3], [0.1, 0.2, 0.5], [0.4, 0.3, 0.3]]),
columns=['a', 'b', 'c'])
res = df.corr(method=entropy)
print(res)
Output
a b c
a 1.000000 0.160246 0.270608
b 0.160246 1.000000 0.167465
c 0.270608 0.167465 1.000000
From the documentation:
callable: callable with input two 1d ndarrays and returning a float. Note that the returned matrix from corr will have 1 along the diagonals and will be symmetric regardless of the callable's behavior.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.