简体   繁体   中英

Partition by Rows equivalent in pandas (python

I am using Azure Machine Learning Studio and what to add a running total on my dataset. This includes a date column, and I want to sum all the rows (for a group) on or before the row date.

In SQL Server, I would use:

    SELECT [t1].*,
SUM([t1].[Amount (Settlement CCY)) 
OVER (
  PARTITION BY [t1].[Contract Ref], [t1].[LOBCode], [t1].[Superline], [t1].[Occupation], [t1].[TransType], [t1].[SettCCY]
  ORDER BY     [t1].[Transaction Date] ASC
  ROWS BETWEEN UNBOUNDED PRECEDING
       AND     CURRENT ROW
)
FROM [t1]
GROUP BY [t1].[contract ref], [t1].[Transaction date], [t1].[LOBCode], [t1].[Superline], [t1].[Occupation], [t1].[TransType], [t1].[SettCCY]

but Azure Machine learning uses SQLite where the Over / Partition clauses aren't implemented.

I've tried an alternative in python/pandas:

dataframe1 = dataframe1.assign(cumAMTscTD=dataframe1.groupby(['ContractRef', 'Basis', 'LOBCode', 'Superline', 'Occupation', 'TransType', 'SettCCY'])['AmtSettCCY'].transform('sum')).sort_values(['ContractRef','TransDate'])

but this sums up everything for the group, not just the those for the dates up toe current row. I assume therefore it doesn't cover the:

ROWS BETWEEN UNBOUNDED PRECEDING
   AND     CURRENT ROW

How would I acheive this?

In SQLite, you can implement the logic as:

with t as (
      select t1.contract_ref, t1.transaction_date, sum(t1.amount) as amount
      from t1
      group by t1.contract_ref, t1.transaction_date
     )
select t.*,
       (select sum(t2.amount)
        from t t2
        where t2.contract_ref = t.contract_ref and
              t2.transaction_date <= t.transaction_date
       ) as running_amount
from t;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM