[英]Reducing execution time for SUMIFS equivalent
我正在嘗試在 Excel 中重現函數 SUMIFS,它大約是:accumulation1 =SUMIFS(value; $fin$1:$fin$5; ini$1)
公式的作用:搜索並添加與 ini 對應的結束列表中的值
示例計算 id3 和累加 1:搜索或添加值或 endPoint(ini = 11) 即 id 1 和 id 5 (3+5)=8 的值
然后創建一個新的累積列並重新啟動相同的計算(我必須這樣做 1004 次..)
ID | 配置文件 | 鰭 | 價值 | 積累1 | 積累2 | 累積總和 |
---|---|---|---|---|---|---|
1 | 10 | 11 | 5 | 0 | 0 | 5 |
2 | 9 | 10 | 0 | 0 | 0 | 0 |
3 | 11 | 12 | 2 | 8 | 0 | 10 |
4 | 12 | 13 | 1 | 2 | 8 | 11 |
5 | 05 | 11 | 3 | 0 | 0 | 3 |
我有一個現在看起來像這樣的累積代碼:
connection = psycopg2.connect(dbname=DB_NAME,user=DB_USER, password=DB_PWD, host=DB_HOST, port=DB_PORT)
cursor = connection.cursor(cursor_factory=psycopg2.extras.DictCursor)
data = pdsql.read_sql_query("select id_bdcarth, id_nd_ini::int ini, id_nd_fin::int fin, v from tempturbi.tmp_somme_v19",connection)
Endtest=1
#loop until Endtest = 0 :
#create a new column accumulation
for i in data.ini:
acc=[]
acc=data.v.loc[data.fin==i] # get values of the upstream segments
acc=sum(acc)
#save acc in accumulation
Endtest=data.sum(accumulation)
print("--- %s seconds ---" % (time.time() - start_time))
並且不保存計算結果腳本只需要 129 秒運行,這比 Excel 慢得多。 有什么方法可以改進腳本並使其更快?
所以我做了一些修改:
loop = [0,1,2]
#while total != 0:
for total in loop:
z=z+1
acc='acc'+str(z)
# tant que i dans ini
for i in data.ini:
v = data.iloc[:,-1:]#get last column
val = data.v.loc[data.fin==i]
val = sum(val)
#creer colonne et stock valeur
data[acc] = val
print(data[acc].sum())
total=total+1
print(data)
print("--- %s seconds ---" % (time.time() - start_time))
(不影響執行時間)
再次感謝您澄清您的問題。 我想我現在明白了,這種方法與您顯示的輸出相匹配。 如果我誤解了,請告訴我,如果這比您的方法更快,請告訴我。 我不知道它會是
import pandas as pd
#Create the test data
df = pd.DataFrame({
'id': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5},
'ini': {0: 10, 1: 9, 2: 11, 3: 12, 4: 5},
'fin': {0: 11, 1: 10, 2: 12, 3: 13, 4: 11},
'value': {0: 5, 1: 0, 2: 2, 3: 1, 4: 3},
})
#Setup initial values
curr_value_col = 'value'
i = 0
all_value_cols = []
#The groupings stay the same throughout the loops
#so we can just group once and reuse it for speed benefit
gb = df.groupby('fin')
#Loop forever until we break
while True:
#update the loop number and add to the value col list
i += 1
all_value_cols.append(curr_value_col)
#group by fin and sum the value_col values
fin_cumsum = gb[curr_value_col].sum()
#map the sums to the new column
next_val_col = 'accumulation{}'.format(i)
df[next_val_col] = df['ini'].map(fin_cumsum).fillna(0).astype(int)
#If the new column we added sums to 0, then quit
#(I think this is what you were saying you wanted, but I'm not sure)
curr_value_col = next_val_col
if df[curr_value_col].sum() == 0:
break
#Get the cumulative sum from the list of columns we've been saving
df['sumOfAccumulation'] = df[all_value_cols].sum(axis=1)
df
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.