简体   繁体   English

获取pandas中每列的最大值数

[英]Get number of maximum values per column in pandas

I have the following dataframe with timeseries data per day: 我有以下数据框,每天有时间序列数据:

time-orig   00:15:00    00:30:00    00:45:00    01:00:00
date                
2010-01-04  1164.3  1163.5  1162.8  1161.8
2010-01-05  1186.3  1185.8  1185.6  1185.0
2010-01-06  1181.5  1181.5  1182.7  1182.3
2010-01-07  1202.1  1201.9  1201.7  1200.8

Now I want to get the number of maximum values per column like this: 现在我想获得每列的最大值数量,如下所示:

'00:15:00' : 3
'00:30:00' : 0
'00:45:00' : 1
'01:00:00' : 0

(ie: the column '00:15:00' has 3 maxima, looking at maximum per row.) (即:列'00:15:00'有3个最大值,每行最大值。)

I know I could transpose the dataframe and run a loop over the columns and use idxmax(), but my question is if there is a vectorized/better way of doing this? 我知道我可以转置数据帧并在列上运行循环并使用idxmax(),但我的问题是,是否有一个矢量化/更好的方法来做到这一点?

One approach would be to use np.argmax on the underlying array data and then do binned-count on the max indices with np.bincount - 一种方法是在底层数组数据上使用np.argmax ,然后使用np.argmax对最大索引进行np.bincount -count计算 -

np.bincount(df.iloc[:,1:].values.argmax(1), minlength=df.shape[1]-1)

Sample run - 样品运行 -

In [141]: df
Out[141]: 
    time-orig  00:15:00  00:30:00  00:45:00  01:00:00
0  2010-01-04    1164.3    1163.5    1162.8    1161.8
1  2010-01-05    1186.3    1185.8    1185.6    1185.0
2  2010-01-06    1181.5    1181.5    1182.7    1182.3
3  2010-01-07    1202.1    1201.9    1201.7    1200.8

In [142]: c = np.bincount(df.iloc[:,1:].values.argmax(1), minlength=df.shape[1]-1)

In [143]: c
Out[143]: array([3, 0, 1, 0])

In [144]: np.c_[df.columns[1:], c]
Out[144]: 
array([['00:15:00', 3],
       ['00:30:00', 0],
       ['00:45:00', 1],
       ['01:00:00', 0]], dtype=object)

Assumption made here is that date is the index. 这里假设的是date是索引。 You can use df.idxmax followed by df.value_counts : 您可以使用df.idxmax然后使用df.value_counts

print(df) 
time-orig   00:15:00  00:30:00  00:45:00  01:00:00
date                                              
2010-01-04    1164.3    1163.5    1162.8    1161.8
2010-01-05    1186.3    1185.8    1185.6    1185.0
2010-01-06    1181.5    1181.5    1182.7    1182.3
2010-01-07    1202.1    1201.9    1201.7    1200.8

s = df.idxmax(1).value_counts().reindex(df.columns, fill_value=0)
print(s)

time-orig
00:15:00    3
00:30:00    0
00:45:00    1
01:00:00    0
dtype: int64

Divakar's solution is quite fast if you want a numpy array. 如果你想要一个numpy阵列,Divakar的解决方案非常快。 For your exact data, a slight modification is needed to his answer: 对于您的确切数据,他的答案需要稍作修改:

val = np.bincount(df.values.argmax(1), minlength=df.shape[1])
s = pd.Series(val, df.columns)
print(s)

time-orig
00:15:00    3
00:30:00    0
00:45:00    1
01:00:00    0
dtype: int64

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM