[英]SQL groupby with division in pandas
and now I have such a SQL statement that I would like to know how I can write that in Pandas, maybe using groupBy and apply?: 现在我有一条SQL语句,我想知道如何在Pandas中编写该语句,也许使用groupBy并应用?
Give a table with columns of A, B 给出带有A,B列的表格
Select A, sum(B) / sum(A)
from table
group by A;
I am now at 我现在在
def func(group):
x = group['B']
y = group['A']
return x.sum() / y.sum()
table.groupby('A').apply(func)
This will generate a sequence of numbers without Column A which is used for grouping by on. 这将生成一个不带列A的数字序列,该列用于按on进行分组。 I would like to have a dataframe as output with A as a separate column also, just like the SQL statement I wrote. 我也希望有一个数据框作为输出,并且A作为单独的列,就像我编写的SQL语句一样。 Can anyone help me to answer this question? 谁能帮我回答这个问题?
Thanks! 谢谢!
Is this what you want ? 这是你想要的吗 ?
df=pd.DataFrame({'A':[1,1,3,4],'B':[2,3,4,5]})
def func(group):
x = group['B']
y = group['A']
return x.sum() / y.sum()
df.groupby('A').apply(func).reset_index()
Out[934]:
A 0
0 1 2.500000
1 3 1.333333
2 4 1.250000
There's no need for an apply
here. 这里不需要apply
。 It would be a lot faster to groupby
, calculate the sum and divide directly, as pandas vectorises these operations. 由于大熊猫将这些操作向量化,因此groupby
,直接计算总和并除以更快。
Borrowing from @Wen's setup, this is how I'd do it - 从@Wen的设置中借用,这就是我的做法-
v = df.groupby('A')[['A', 'B']].sum()
v['B'] /= v['A']
del v['A']
B
A
1 2.500000
3 1.333333
4 1.250000
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.