简体   繁体   中英

SQL Query Formation vs SAS

A quick background: I am trying to revisit SQL after a really long time (about 13 years). All this while, I've been working on SAS. While there is a SQL procedure within SAS and I use it quite often but now that I am working on open source stuff, I am realizing that SAS's contructs of SQL have been highly personalized. It was comparatively lot simpler to write queries in SAS but not as much in pure SQL (MariaDB). I can very well account that to my lack of knowledge in SQL.

Problem: I've been trying to create a personal finance management dashboard (while simultaneously trying to learn python/MySQL/PHP). Talking only in conext of the problem, I've created two tables:

Table 1 (mutual_fund_all) - one which contains the information about mutual fund ID, current NAV. This table gets the updated NAV information appended automatically, for all the mutual funds. So I've like 10000 mutual funds and their time series nav data.

Table 2 (owned mutual funds) - this one contains all the funds I've purchased, the price at which I bought them and the total number of units owned.

Now I want to use these tables to be merged in a way that I am able to see the mutual fund name, purchase cost and the profit - for each of the funds.

In SAS, I'd would have created a couple of temp tables and then finally would have merged the needed info to get the required info. In SQL, I am not sure if I can do similarly and my little knowledge somewhat is forcing me to everything in one single query. Since I am stuck, I need your help.

Here is what I've written:

SELECT
    b.owner,
    a.mf_name,
    (b.purchase_price) as purchase_price,
    (b.units*a.mf_nav - b.purchase_price) as profit

FROM    
    mutual_funds            a,
    mf_purchase_summary     b 
where 
    a.mf_id=b.mf_id
group BY    
    b.owner, a.mf_name

This one is somewhat working but it's not giving me the correct info since it's probably pulling wrong NAV entry from the mutual_fund table. I need only the latest available NAV (have a load_date field in the table and I just want to use the nav from the record where load_date is max). I am just not being able to do it in SQL.

In SAS, in the first step I'd have first fetched only the owned mutual fund records from the mutual_funds table. Then in the second step I would have sorted those filtered records with the descending load_date, would have pulled only the top records for each mutual fund and with the nav fetched, would have gone ahead with the calculations.

Can I do something similar things in SQL? It'd greatly simply my effort (and would also make the overall code more readable/ segmented).

SELECT
       b.owner,
       a.mf_name,
       SUM( b.purchase_price ) as purchase_price,
       SUM( b.units*a.mf_nav - b.purchase_price ) as profit    
  FROM    
       mutual_funds            a
INNER JOIN
       mf_purchase_summary     b 
    ON
       a.mf_id=b.mf_id
 GROUP BY    
       b.owner, a.mf_name;

Consider joining by an aggregate derived table. Here, I try to break down your SAS steps. Overall solution should be fully compliant in SAS's proc sql and any other ANSI-compliant SQL dialect.

Unit Level Join (using explicit join)

In SAS, in the first step I'd have first fetched only the owned mutual fund records from the mutual_funds table.

SELECT
    b.owner,
    a.mf_name,
    (b.purchase_price) as purchase_price,
    (b.units*a.mf_nav - b.purchase_price) as profit    
FROM    
    mutual_funds            a
INNER JOIN
    mf_purchase_summary     b 
ON a.mf_id=b.mf_id

Aggregate Level

Then in the second step I would have sorted those filtered records with the descending load_date, would have pulled only the top records for each mutual fund and with the nav fetched, would have gone ahead with the calculations.

SELECT        
    a.mf_id,
    a.mf_name,
    MAX(a.load_date) As max_load_date
FROM    
    mutual_funds            a
GROUP BY a.mf_id,
         a.mf_name

Overall Query (joins unit level with derived table aggregate on mf_id and load_data)

SELECT
    b.owner,
    a.mf_name,
    (b.purchase_price) as purchase_price,
    (b.units*a.mf_nav - b.purchase_price) as profit    
FROM    
    mutual_funds            a
INNER JOIN
    mf_purchase_summary     b 
ON a.mf_id=b.mf_id
INNER JOIN
    (SELECT        
         a.mf_id,
         a.mf_name,
         MAX(a.load_date) As max_load_date
     FROM    
         mutual_funds            a
     GROUP BY a.mf_id,
              a.mf_name) As agg
ON agg.mf_id = a.mf_id 
AND agg.max_load_date = a.load_date

So the final query that worked for me is:

SELECT
    b.owner,
    a.mf_name,
    (b.purchase_price) as purchase_price,
    (b.units*a.mf_nav - b.purchase_price) as profit    
FROM    
    mutual_funds            a
INNER JOIN
    mf_purchase_summary     b 
ON 
    a.mf_id=b.mf_id
INNER JOIN
    (SELECT        
         mf_id,
         mf_name,
         MAX(load_dt) as max_load_date
     FROM    
         mutual_funds            
     GROUP BY mf_id)        c 
ON 
    a.mf_id = c.mf_id 
    AND 
    c.max_load_date = a.load_dt

However, I'd continue to improve it. I'd want to incorporate the recommendation made by Parfait and shall update the answer, once done.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM