I am using Azure Machine Learning Studio and what to add a running total on my dataset. This includes a date column, and I want to sum all the rows (for a group) on or before the row date.
In SQL Server, I would use:
SELECT [t1].*,
SUM([t1].[Amount (Settlement CCY))
OVER (
PARTITION BY [t1].[Contract Ref], [t1].[LOBCode], [t1].[Superline], [t1].[Occupation], [t1].[TransType], [t1].[SettCCY]
ORDER BY [t1].[Transaction Date] ASC
ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW
)
FROM [t1]
GROUP BY [t1].[contract ref], [t1].[Transaction date], [t1].[LOBCode], [t1].[Superline], [t1].[Occupation], [t1].[TransType], [t1].[SettCCY]
but Azure Machine learning uses SQLite where the Over / Partition clauses aren't implemented.
I've tried an alternative in python/pandas:
dataframe1 = dataframe1.assign(cumAMTscTD=dataframe1.groupby(['ContractRef', 'Basis', 'LOBCode', 'Superline', 'Occupation', 'TransType', 'SettCCY'])['AmtSettCCY'].transform('sum')).sort_values(['ContractRef','TransDate'])
but this sums up everything for the group, not just the those for the dates up toe current row. I assume therefore it doesn't cover the:
ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW
How would I acheive this?
In SQLite, you can implement the logic as:
with t as (
select t1.contract_ref, t1.transaction_date, sum(t1.amount) as amount
from t1
group by t1.contract_ref, t1.transaction_date
)
select t.*,
(select sum(t2.amount)
from t t2
where t2.contract_ref = t.contract_ref and
t2.transaction_date <= t.transaction_date
) as running_amount
from t;
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.