简体   繁体   English

使用pandas和numpy的表的平均索引

[英]Averaging indices of table using pandas and numpy

I have been playing with pandas for a few hours now, I was wondering whether there is another faster way to add an extra column to your table which consists of the average of that row? 我已经和熊猫玩了几个小时了,我想知道是否还有另一种更快的方法可以向您的表中添加一个包含该行平均值的额外列? I am creating a new list which contains the mean and then I am incorporating it in the data frame. 我正在创建一个包含均值的新列表,然后将其合并到数据框中。

This is my code: 这是我的代码:

import numpy as np
import pandas as pd
userdata={"A":[2,5],"B":[4,6]}
tab=pd.DataFrame((userdata), columns=["A","B"])
lst=[np.mean([tab.loc[i,"A"],tab.loc[i,"B"]]) for i in range(len(tab.index))]
tab["Average of A and B"]=pd.DataFrame(lst)
tab

try df.mean(1) with assign . 尝试df.mean(1)assign df.mean(1) tells pandas to calculate the mean along axis=1 (rows). df.mean(1)告诉熊猫计算沿axis=1 (行)的平均值。 axis=0 is the default. 默认axis=0

df.assign(Mean=df.mean(1))

This produces a copy of df with added column. 这将生成带有添加列的df副本。

To alter the existing dataframe 更改现有数据框

df['Mean'] = df.mean(1)

demo 演示

tab.assign(Mean=tab.mean(1))

   A  B  Mean
0  2  4   3.0
1  5  6   5.5

A NumPy solution would be to work with the underlying array data for performance - NumPy解决方案是使用基础数组数据以提高性能-

tab['average'] = tab.values.mean(1)

To choose specific columns, like 'A' and 'B' - 要选择特定的列,例如'A''B' -

tab['average'] = tab[['A','B']].values.mean(1)

Runtime test - 运行时测试-

In [41]: tab = pd.DataFrame(np.random.randint(0,9,(10000,10)))

# @piRSquared's soln
In [42]: %timeit tab.assign(Mean=tab.mean(1))
1000 loops, best of 3: 615 µs per loop

In [43]: tab = pd.DataFrame(np.random.randint(0,9,(10000,10)))

In [44]: %timeit tab['average'] = tab.values.mean(1)
1000 loops, best of 3: 297 µs per loop


In [37]: tab = pd.DataFrame(np.random.randint(0,9,(10000,100)))

# @piRSquared's soln
In [38]: %timeit tab.assign(Mean=tab.mean(1))
100 loops, best of 3: 4.71 ms per loop

In [39]: tab = pd.DataFrame(np.random.randint(0,9,(10000,100)))

In [40]: %timeit tab['average'] = tab.values.mean(1)
100 loops, best of 3: 3.6 ms per loop

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM