![](/img/trans.png)
[英]Grouping and aggregating by multiple columns while applying column as an aggregate argument in Pandas?
[英]Applying an operation on multiple columns with a fixed column in pandas
我有一個數據框,如下所示。 最后一列顯示了所有列的值之和,即A
, B
, D
, K
和T
請注意,有些列也有NaN
。
word1,A,B,D,K,T,sum
na,,63.0,,,870.0,933.0
sva,,1.0,,3.0,695.0,699.0
a,,102.0,,1.0,493.0,596.0
sa,2.0,487.0,,2.0,15.0,506.0
su,1.0,44.0,,136.0,214.0,395.0
waw,1.0,9.0,,34.0,296.0,340.0
如何計算每一行的熵? 即我應該找到類似的東西
df['A']/df['sum']*log(df['A']/df['sum']) + df['B']/df['sum']*log(df['B']/df['sum']) + ...... + df['T']/df['sum']*log(df['T']/df['sum'])
條件是每當log
的值變為zero
或NaN
,整個值應被視為零(根據定義,日志將返回錯誤,因為日志0未定義)。
我知道使用lambda操作來應用於各個列。 在這里,我無法想到一個純粹的熊貓解決方案,其中固定的列sum
應用於不同的列A
, B
, D
等。雖然我可以想到在具有硬編碼列值的CSV文件上進行簡單的循環迭代。
我認為您可以使用ix
從A
到T
選擇列,然后用numpy.log
除以div
。 最后使用sum
:
print (df['A']/df['sum']*np.log(df['A']/df['sum']))
0 NaN
1 NaN
2 NaN
3 -0.021871
4 -0.015136
5 -0.017144
dtype: float64
print (df.ix[:,'A':'T'].div(df['sum'],axis=0)*np.log(df.ix[:,'A':'T'].div(df['sum'],axis=0)))
A B D K T
0 NaN -0.181996 NaN NaN -0.065191
1 NaN -0.009370 NaN -0.023395 -0.005706
2 NaN -0.302110 NaN -0.010722 -0.156942
3 -0.021871 -0.036835 NaN -0.021871 -0.104303
4 -0.015136 -0.244472 NaN -0.367107 -0.332057
5 -0.017144 -0.096134 NaN -0.230259 -0.120651
print((df.ix[:,'A':'T'].div(df['sum'],axis=0)*np.log(df.ix[:,'A':'T'].div(df['sum'],axis=0)))
.sum(axis=1))
0 -0.247187
1 -0.038471
2 -0.469774
3 -0.184881
4 -0.958774
5 -0.464188
dtype: float64
df1 = df.iloc[:, :-1]
df2 = df1.div(df1.sum(1), axis=0)
df2.mul(np.log(df2)).sum(1)
word1
na -0.247187
sva -0.038471
a -0.469774
sa -0.184881
su -0.958774
waw -0.464188
dtype: float64
from StringIO import StringIO
import pandas as pd
text = """word1,A,B,D,K,T,sum
na,,63.0,,,870.0,933.0
sva,,1.0,,3.0,695.0,699.0
a,,102.0,,1.0,493.0,596.0
sa,2.0,487.0,,2.0,15.0,506.0
su,1.0,44.0,,136.0,214.0,395.0
waw,1.0,9.0,,34.0,296.0,340.0"""
df = pd.read_csv(StringIO(text), index_col=0)
df
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.