简体   繁体   中英

Aggregations in SQLite

I am looking to construct a SQL query which is able to sum up installment values by month. Ordinarily this wouldn't be too hard as you would simply sum the installment values and group by month. However, the problem is not that easy and in the rest of the post, I'll illustrate why and solicit any help people are able to offer.

First it's important to note the installments column. Where installments is 1, that means the total value is paid at the time of the purchase. Where installments is greater than 1, this means that the total value is paid in the current month as well as the following months. For example, if we see transaction_id 9 and 10, this is a $100 transaction that has 2 installments which means that $50 will be paid in February and $50 will be paid in March.

Consider that we want to see the monthly bills for credit_card_id = 11111111 . If we look to the installments column, we can see that the correct output should be as follows:

  • January: 19.99 + 75.3
  • February: 1337 + 75.3
  • March: 75.3

Again, just to be clear, the 75.3 in March occurs because we had a 3 installment transaction in January meaning the customer would be charged 75.3 in January, February, and March. The issue is that I don't know how to create the category for March from the given data.

交易

First, I re-created the table in SQL and was easily able to get all the transactions for the card by month with the following SQLite query

select strftime('%m', transaction_date) as Month, total_value, installment_value, installments 
from transactions 
WHERE credit_card_id = '11111111';

which outputs a table that looks like this 产量

However, it wasn't obvious how to split the 3 installment periods into 01, 02, and 03, so I created a new table with a column txn which is meant to give an id to the unique transactions that can be thought of as 1 group.

CREATE TABLE transactions (
    transaction_id int primary key,
    credit_card_id int,
    transaction_date timestamp,
    merchant_name varchar(256),
    total_value decimal(19,4),
    installment_value decimal(19,4),
    installments int,
    txn int
);

insert into transactions values(1,11111111,'2018-01-10T00:00:00','Colorful Soaps', 19.99, 19.99, 1, 1);
insert into transactions values(2,22222222,'2018-01-11T00:01:00','Cantina da Mamma',43.5,43.5,1,2);
insert into transactions values(3,33333333,'2018-01-12T01:02:00','Boulevard Hotel',129,129,1,3);
insert into transactions values(4,11111111,'2018-01-15T11:11:11','Micas Bar',225.9,75.3,3,4);
insert into transactions values(5,11111111,'2018-01-15T11:11:11','Micas Bar',225.9,75.3,3,4);
insert into transactions values(6,11111111,'2018-01-15T11:11:11','Micas Bar',225.9,75.3,3,4);
insert into transactions values(7,22222222,'2018-01-18T22:10:01','IPear Store',9999.99,9999.99,1,5);
insert into transactions values(8,11111111,'2018-02-20T21:08:32','Forrest Paintball',1337,1337,1,6);
insert into transactions values(9,44444444,'2018-02-22T00:05:30','Unicorn Costumes',100,50,2,7);
insert into transactions values(10,44444444,'2018-02-22T00:05:30','Unicorn Costumes',100,50,2,7);

My questions are

  1. Is it possible to get an output of the format I identified above in SQLite and if so, how?
  2. Do I have to have the txn column in order to get this information?

Thank you for your help.

SQLlite has ROW_NUMBER() (checked here SQLlite ).

   SELECT 
        installment_month
        ,credit_card_id
        ,SUM(installment_value) 
    FROM (
        SELECT 
            CASE WHEN strftime('%m',transaction_date) + ROW_NUMBER () OVER(PARTITION BY credit_card_id, transaction_date ORDER BY transaction_date) - 1 > 12
                    THEN strftime('%Y',transaction_date)*100+strftime('%m',transaction_date) + ROW_NUMBER () OVER(PARTITION BY credit_card_id, transaction_date ORDER BY transaction_date) - 1 + 88
                    ELSE strftime('%Y',transaction_date)*100+strftime('%m',transaction_date) + ROW_NUMBER () OVER(PARTITION BY credit_card_id, transaction_date ORDER BY transaction_date) - 1
            END as installment_month
            ,* 
        from transactions
        ) AS a
    GROUP by installment_month, credit_card_id

An issue comes when installments exceed two years. You will have to work that part. I would write this pice of code (case when...) inside a function it would make the whole query more clear.

Assuming you run SQLite 3.25+ version, consider using CTE and window function that creates a running count by same credit_card_id and transaction_dat e and uses this value to add needed months to transaction date. From there, aggregate according to new calculated date, install_date .

WITH cte AS
   (SELECT *,
       DATE(transaction_date, 
            '+' || (ROW_NUMBER() 
                      OVER(PARTITION BY transaction_date, credit_card_id 
                           ORDER BY transaction_date) - 1)
                || ' month'
            ) AS install_date       
    FROM transactions)

SELECT credit_card_id, 
       STRFTIME('%Y', install_date) AS install_year, 
       STRFTIME('%m', install_date) AS install_month, 
       SUM(installment_value) AS sum_installment_value
FROM cte 
GROUP BY credit_card_id, 
         STRFTIME('%Y', install_date), 
         STRFTIME('%m', install_date)
ORDER BY credit_card_id, 
         STRFTIME('%Y', install_date), 
         STRFTIME('%m', install_date);

Rextester Demo using PostgreSQL, since AFAIK no online fiddle (SQLFiddle, SQLiteonline, DBFiddle, etc.) supports SQLite with window functions

Here's a solution that doesn't require row numbering/will work in much older SQLite (any version that supports date() basically).. It just relies on a join to a calendar table (that you can generate using a variety of techniques, but in the linked example I just generated N rows by creating a table and doing a direct insert of the data I needed for the query) that has one row for the 1st day of every month. It uses a cartesian join condition that causes eg 3 rows for every payment that has 3 installments:

select 
  t.credit_card_id,
  date(cal.d, '-1 month') as month_of_installment,
  sum(t.installment_value)
from
  cal inner join transactions t on
  t.transaction_date between date(cal.d, '-'||installments||' months') and cal.d
group by
  t.credit_card_id,
  date(cal.d, '-1 month')

You can see the setup at https://www.db-fiddle.com/f/ogj2hK3cMwqp46MY6uVwX8/0

Incidentally, something is messed up in your question or your example data.

The monthly payments for credit card ID 11111111 are:

2018-01   245.89
2018-02   1562.9
2018-03   225.9

Your example data has 3 simultaneous payments to Micas Bar. We know these are different because they have a different transaction ID despite identical other data. January is thus 75.3 + 75.3 + 75.3 + 19.99, and not just 19.99 + 75.3 as the question asserts.

To see more of how the query works, run the non grouped form:

select 
  t.credit_card_id,
  date(cal.d, '-1 month') as month_of_installment,
  t.*
from
  cal inner join transactions t on
  t.transaction_date between date(cal.d, '-'||installments||' months') and cal.d
order by
  t.credit_card_id,
  date(cal.d, '-1 month')

Most DBAs I've met advocate having a numbers/dates table kicking around in a db to generate queries such as this - it's a fast way of generating sequences of rows and gives you a row to join to in cases where there are eg no transactions for a month (you can left join the transactions table onto the calendar table and get a single row, that sums to 0, for months with no transactions). Generating a bunch of months/days rows for the next 100 years is a trivial one-time op

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM