減少 SUMIFS 等價物的執行時間

Question

我正在嘗試在 Excel 中重現函數 SUMIFS，它大約是：accumulation1 =SUMIFS(value; $fin$1:$fin$5; ini$1)

公式的作用：搜索並添加與 ini 對應的結束列表中的值

示例計算 id3 和累加 1：搜索或添加值或 endPoint(ini = 11) 即 id 1 和 id 5 (3+5)=8 的值

然后創建一個新的累積列並重新啟動相同的計算（我必須這樣做 1004 次..）

ID	配置文件	鰭	價值	積累1	積累2	累積總和
1	10	11	5	0	0	5
2	9	10	0	0	0	0
3	11	12	2	8	0	10
4	12	13	1	2	8	11
5	05	11	3	0	0	3

我有一個現在看起來像這樣的累積代碼：

    connection = psycopg2.connect(dbname=DB_NAME,user=DB_USER, password=DB_PWD, host=DB_HOST, port=DB_PORT)
    cursor = connection.cursor(cursor_factory=psycopg2.extras.DictCursor)
    
    data = pdsql.read_sql_query("select id_bdcarth, id_nd_ini::int ini, id_nd_fin::int fin, v from tempturbi.tmp_somme_v19",connection)

Endtest=1
 
#loop until Endtest = 0 : 
    #create a new column accumulation  
    for i in data.ini:
        acc=[]
        acc=data.v.loc[data.fin==i] # get values of the upstream segments
        acc=sum(acc) 
        #save acc in accumulation 

    Endtest=data.sum(accumulation)     
    
    print("--- %s seconds ---" % (time.time() - start_time))

並且不保存計算結果腳本只需要 129 秒運行，這比 Excel 慢得多。 有什么方法可以改進腳本並使其更快？

我想要做的是沿着河網行走並計算值：

Answer 1

所以我做了一些修改：

loop = [0,1,2]
    #while total != 0:
for total in loop:
    z=z+1
    acc='acc'+str(z)

    # tant que i dans ini
    for i in data.ini:
        v = data.iloc[:,-1:]#get last column
        val = data.v.loc[data.fin==i] 
        val = sum(val) 
        
        #creer colonne et stock valeur
        data[acc] = val
    
    print(data[acc].sum())
    total=total+1
        
print(data)
print("--- %s seconds ---" % (time.time() - start_time))

（不影響執行時間）

Answer 2

再次感謝您澄清您的問題。 我想我現在明白了，這種方法與您顯示的輸出相匹配。 如果我誤解了，請告訴我，如果這比您的方法更快，請告訴我。 我不知道它會是

import pandas as pd

#Create the test data
df = pd.DataFrame({
    'id': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5},
    'ini': {0: 10, 1: 9, 2: 11, 3: 12, 4: 5},
    'fin': {0: 11, 1: 10, 2: 12, 3: 13, 4: 11},
    'value': {0: 5, 1: 0, 2: 2, 3: 1, 4: 3},
})

#Setup initial values
curr_value_col = 'value'
i = 0
all_value_cols = []

#The groupings stay the same throughout the loops
#so we can just group once and reuse it for speed benefit
gb = df.groupby('fin')

#Loop forever until we break
while True:
    #update the loop number and add to the value col list
    i += 1
    all_value_cols.append(curr_value_col)
    
    #group by fin and sum the value_col values
    fin_cumsum = gb[curr_value_col].sum()
    
    #map the sums to the new column
    next_val_col = 'accumulation{}'.format(i)
    df[next_val_col] = df['ini'].map(fin_cumsum).fillna(0).astype(int)
    
    #If the new column we added sums to 0, then quit
    #(I think this is what you were saying you wanted, but I'm not sure)
    curr_value_col = next_val_col
    if df[curr_value_col].sum() == 0:
        break
        
    
#Get the cumulative sum from the list of columns we've been saving
df['sumOfAccumulation'] = df[all_value_cols].sum(axis=1)
df

減少 SUMIFS 等價物的執行時間

問題描述

2 個解決方案

解決方案1
0 2021-11-12 17:55:02

解決方案2
0 2021-11-12 19:25:10

減少 SUMIFS 等價物的執行時間

問題描述

2 個解決方案

解決方案1 0 2021-11-12 17:55:02

解決方案2 0 2021-11-12 19:25:10

解決方案1
0 2021-11-12 17:55:02

解決方案2
0 2021-11-12 19:25:10