简体   繁体   English

如何以一种快速和 Pythonic 的方式在 python pandas 中适应和使用 Excel 的 Sumif function?

[英]How can I adapt and use Excel's Sumif function in python pandas in a way that is fast and pythonic?

I'm working on a project that is trying to port over functions and operations from Excel to python in pandas.我正在开发一个项目,该项目试图将功能和操作从 Excel 移植到 pandas 中的 python。

I have a lot of SUMIF functions I'm trying to replicate in the data.我试图在数据中复制很多 SUMIF 函数。 Trouble is that I don't think pandas has a specifically analagous function.麻烦的是我不认为 pandas 有一个特别类似的 function。 I might have an excel expression like:我可能有一个 excel 表达式,如:

=(SUMIFS('Sheetname':BI$146,BI$282:'Sheetname',$B$146:$B$282,$I1836))

Where the first argument is the region that needs to be summed up.其中第一个参数是需要总结的区域。 The second region is the range where we're checking for the matching criteria, and the last argument is the specific value we're looking for.第二个区域是我们检查匹配条件的范围,最后一个参数是我们正在寻找的特定值。

What I'm doing right now is running a nested loop that iterates over all the rows and columns and checks the first iteration finds the matching rows while the inner loop finds the matching columns.我现在正在做的是运行一个嵌套循环,该循环遍历所有行和列,并检查第一次迭代找到匹配的行,而内部循环找到匹配的列。 The values are then summed and inputted into the pandas function.然后将这些值相加并输入到 pandas function 中。

Something like:就像是:

table_dict_temp is the table I'm populating the values in table_temp is the table to be referenced table_dict_temp 是我正在填充的表 table_temp 中的值是要引用的表

for i in range(len(table_dict_temp)):
    cog_loss = table_temp.loc[table_temp[COLUMN OF COMPARISON]==table_dict_temp[COLUMN OF COMPARISON][i]]
    for j in range(10, len(table_dict_temp.columns)):
        cog_loss_temp = cog_loss[table_dict_temp.columns[j]].sum()
        table_dict_temp.iloc[i,j]=cog_loss_temp

The problem I'm running into is that this seems to be a non pythonic way to do this and it takes a lot of time as well.我遇到的问题是,这似乎是一种非 Python 的方式来做到这一点,而且也需要很多时间。 Any advice on how I can write the functioin to be faster would be greatly appreciated!任何关于如何更快地编写函数的建议将不胜感激!

The Excel example data: Excel 示例数据:

https://support.microsoft.com/en-us/office/sumifs-function-c9e748f5-7ea7-455d-9406-611cebce642b https://support.microsoft.com/en-us/office/sumifs-function-c9e748f5-7ea7-455d-9406-611cebce642b

Quantity Sold   Product Salesperson
5   Apples  Tom
4   Apples  Sarah
15  Artichokes  Tom
3   Artichokes  Sarah
22  Bananas Tom
12  Bananas Sarah
10  Carrots Tom
33  Carrots Sarah
    
Description

=SUMIFS(A2:A9, B2:B9, "=A*", C2:C9, "Tom")

Adds the number of products that begin with A and were sold by Tom. 
It uses the wildcard character * in Criteria1, "=A*" to look for matching product names in Criteria_range1 B2:B9, 
and looks for the name "Tom" in Criteria_range2 C2:C9. 
It then adds the numbers in Sum_range A2:A9 that meet both conditions. 
The result is 20.

=SUMIFS(A2:A9, B2:B9, "<>Bananas", C2:C9, "Tom")

Adds the number of products that aren’t bananas and are sold by Tom. 
It excludes bananas by using <> in the Criteria1, "<>Bananas", 
and looks for the name "Tom" in Criteria_range2 C2:C9. 
It then adds the numbers in Sum_range A2:A9 that meet both conditions. 
The result is 30.

the pythonic solution: pythonic解决方案:

import io
import pandas as pd

data_str = '''
Quantity Sold   Product Salesperson
5   Apples  Tom
4   Apples  Sarah
15  Artichokes  Tom
3   Artichokes  Sarah
22  Bananas Tom
12  Bananas Sarah
10  Carrots Tom
33  Carrots Sarah
'''.strip()

df = pd.read_csv(io.StringIO(data_str), sep='\t')

# =SUMIFS(A2:A9, B2:B9, "=A*", C2:C9, "Tom")
cond = True
cond &= df['Product'].str.startswith('A')
cond &= df['Salesperson'] == 'Tom'
df.loc[cond, 'Quantity Sold'].sum()

# =SUMIFS(A2:A9, B2:B9, "<>Bananas", C2:C9, "Tom")
cond = True
cond &= df['Product'] != 'Bananas'
cond &= df['Salesperson'] == 'Tom'
df.loc[cond, 'Quantity Sold'].sum()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM