简体   繁体   English

从 Python 中的字典中的 pandas 数据帧中获取最小值和最大值

[英]Get min and max values from pandas dataframes within a dictionary in Python

I have a dictionary ( pollution ) with one key I wish to ignore ( chemical_start_time ) and all other keys having values that are pandas dataframes.我有一本字典( pollution ),其中有一个我希望忽略的键( chemical_start_time ),所有其他键的值都是 pandas 数据帧。

I want to get the maximum value present in any of the dataframes and the minumum non-zero value.我想获得任何数据帧中存在的最大值和最小非零值。

I believe the following code does exactly this, but I'm looking for the most efficient or "pythonic" way of doing this我相信下面的代码正是这样做的,但我正在寻找最有效或“pythonic”的方式来做到这一点

import numpy as np

max_pols = []
min_pols = []

for key, df in pollution.items():
    if key != 'chemical_start_time':
        max_pols.append(max(df.max()))
        min_pols.append(np.nanmin(df[df > 0].min()))

max_pol = max(max_pols)
min_pol = min(min_pols)

One possible solution for improve performance is use numpy.ravel for 1d array from all values of DataFrame and then use np.min (if possible missing values np.nanmin ) and np.max :提高性能的一种可能解决方案是使用numpy.ravel用于来自 DataFrame 的所有值的一维数组,然后使用np.min (如果可能缺少值np.nanmin )和np.max

df1 = pd.DataFrame({
         'C':[7,8,9,4,2,3],
         'D':[10,3,5,-7,10,0],
         'E':[5,-3,6,9,2,4],
})

df2 = pd.DataFrame({
         'A':[73,8,9,4,2,3],
         'D':[1,3,52,-7,1,0],
         'E':[53,-33,63,9,2,4],
})
pollution = {'a':df1, 'b':df2, 'chemical_start_time':pd.DataFrame([100])}

max_pols = []
min_pols = []

for key, df in pollution.items():
    if key != 'chemical_start_time':
        v = df.values.ravel()
        max_pols.append(np.max(v))
        min_pols.append(np.min(v[v > 0]))

max_pol = np.max(max_pols)
min_pol = np.min(min_pols)

print (max_pol)
73
print (min_pol)
1

Also you can use:您也可以使用:

max_pols.append(df.max().max())
min_pols.append(df[df > 0].min().min())

Combine all relevant dataframes into one:将所有相关数据框合并为一个:

frames = pd.concat([frame for key, frame in pollution.items() if key != 'chemical_start_time'])

Then get the max, min values:然后得到最大值,最小值:

max_pol = frames.max().max()
min_pol = frames[frames > 0].min().min()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM