简体   繁体   English

在数据框列上计算最小值,最大值和平均值

[英]Calculate min, max and average on dataframe column

I have a dataframe having columns bn, pn, s, tempC, tempF and humidity. 我有一个包含bn,pn,s,tempC,tempF和湿度列的数据框。 tempC,tempF,humidity are list. 列出了tempC,tempF,湿度。 I want to calculate min, max and average of tempC,tempF,humidity and want to keep all these original values also. 我想计算tempC,tempF,湿度的最小值,最大值和平均值,并且还要保留所有这些原始值。 I don't how to do it. 我不怎么做。

       bn    pn  s             tempC             tempF          humidity
0  4562562240  0020  2          [31, 33]          [88, 91]          [78, 74]
1  4562562240  0030  2          [33, 34]          [91, 92]          [74, 70]
2  4562562240  0040  2          [34, 35]          [92, 94]          [70, 67]
3  4562562240  0050  2          [35, 35]          [94, 96]          [67, 64]
4  4562562240  0060  2  [35, 35, 35, 35]  [96, 95, 95, 95]  [64, 65, 66, 67]

So, output should be like 所以,输出应该像

       bn    pn  s             tempC             tempF          humidity        min_tempC   max_tempC   avg_tempC   min_tempF   max_temF  avg_tempF   ...
0  4562562240  0020  2          [31, 33]          [88, 91]          [78, 74]         31         33          32          88          91      89.5
1  4562562240  0030  2          [33, 34]          [91, 92]          [74, 70]         33         34          33.5        91          92      91.5

.
.
.

Use custom function with list comprehensions: 将自定义函数与列表推导一起使用:

def f(x):

    a = pd.Series([min(i) for i in x], index=x.index)
    b = pd.Series([max(i) for i in x], index=x.index)
    c = pd.Series([sum(i)/len(i) for i in x], index=x.index)
    return pd.concat([a,b,c], keys=('min','max','mean'))


cols = ['tempC','tempF','humidity']
df1 = df[cols].agg(f, axis=1).sort_index(axis=1, level=1)
df1.columns = df1.columns.map('_'.join)

df = df.join(df1)
print (df)
           bn  pn  s             tempC             tempF          humidity  \
0  4562562240  20  2          [31, 33]          [88, 91]          [78, 74]   
1  4562562240  30  2          [33, 34]          [91, 92]          [74, 70]   
2  4562562240  40  2          [34, 35]          [92, 94]          [70, 67]   
3  4562562240  50  2          [35, 35]          [94, 96]          [67, 64]   
4  4562562240  60  2  [35, 35, 35, 35]  [96, 95, 95, 95]  [64, 65, 66, 67]   

   min_tempC  max_tempC  mean_tempC  min_tempF  max_tempF  mean_tempF  \
0       31.0       33.0        32.0       88.0       91.0       89.50   
1       33.0       34.0        33.5       91.0       92.0       91.50   
2       34.0       35.0        34.5       92.0       94.0       93.00   
3       35.0       35.0        35.0       94.0       96.0       95.00   
4       35.0       35.0        35.0       95.0       96.0       95.25   

   min_humidity  max_humidity  mean_humidity  
0          74.0          78.0           76.0  
1          70.0          74.0           72.0  
2          67.0          70.0           68.5  
3          64.0          67.0           65.5  
4          64.0          67.0           65.5  

For one example, you can do : 例如,您可以执行以下操作:

temp_c_min = [min(i) for i in df['tempC']];

Then create a one column data frame : 然后创建一列数据框:

df_tempC = pandas.DataFrame(temp_c_min, columns=['temp_C min'])

Then add this to your original df : df['tempC min'] = df_tempC; 然后将其添加到原始dfdf['tempC min'] = df_tempC; which will create/add one new column to df . 这将在df创建/添加一个新列。 You can do the same for the others. 您可以为其他人做同样的事情。 Is this okay? 这个可以吗?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM