A quick background: I am trying to revisit SQL after a really long time (about 13 years). All this while, I've been working on SAS. While there is a SQL procedure within SAS and I use it quite often but now that I am working on open source stuff, I am realizing that SAS's contructs of SQL have been highly personalized. It was comparatively lot simpler to write queries in SAS but not as much in pure SQL (MariaDB). I can very well account that to my lack of knowledge in SQL.
Problem: I've been trying to create a personal finance management dashboard (while simultaneously trying to learn python/MySQL/PHP). Talking only in conext of the problem, I've created two tables:
Table 1 (mutual_fund_all) - one which contains the information about mutual fund ID, current NAV. This table gets the updated NAV information appended automatically, for all the mutual funds. So I've like 10000 mutual funds and their time series nav data.
Table 2 (owned mutual funds) - this one contains all the funds I've purchased, the price at which I bought them and the total number of units owned.
Now I want to use these tables to be merged in a way that I am able to see the mutual fund name, purchase cost and the profit - for each of the funds.
In SAS, I'd would have created a couple of temp tables and then finally would have merged the needed info to get the required info. In SQL, I am not sure if I can do similarly and my little knowledge somewhat is forcing me to everything in one single query. Since I am stuck, I need your help.
Here is what I've written:
SELECT
b.owner,
a.mf_name,
(b.purchase_price) as purchase_price,
(b.units*a.mf_nav - b.purchase_price) as profit
FROM
mutual_funds a,
mf_purchase_summary b
where
a.mf_id=b.mf_id
group BY
b.owner, a.mf_name
This one is somewhat working but it's not giving me the correct info since it's probably pulling wrong NAV entry from the mutual_fund table. I need only the latest available NAV (have a load_date field in the table and I just want to use the nav from the record where load_date is max). I am just not being able to do it in SQL.
In SAS, in the first step I'd have first fetched only the owned mutual fund records from the mutual_funds table. Then in the second step I would have sorted those filtered records with the descending load_date, would have pulled only the top records for each mutual fund and with the nav fetched, would have gone ahead with the calculations.
Can I do something similar things in SQL? It'd greatly simply my effort (and would also make the overall code more readable/ segmented).
SELECT
b.owner,
a.mf_name,
SUM( b.purchase_price ) as purchase_price,
SUM( b.units*a.mf_nav - b.purchase_price ) as profit
FROM
mutual_funds a
INNER JOIN
mf_purchase_summary b
ON
a.mf_id=b.mf_id
GROUP BY
b.owner, a.mf_name;
Consider joining by an aggregate derived table. Here, I try to break down your SAS steps. Overall solution should be fully compliant in SAS's proc sql
and any other ANSI-compliant SQL dialect.
Unit Level Join (using explicit join)
In SAS, in the first step I'd have first fetched only the owned mutual fund records from the mutual_funds table.
SELECT
b.owner,
a.mf_name,
(b.purchase_price) as purchase_price,
(b.units*a.mf_nav - b.purchase_price) as profit
FROM
mutual_funds a
INNER JOIN
mf_purchase_summary b
ON a.mf_id=b.mf_id
Aggregate Level
Then in the second step I would have sorted those filtered records with the descending load_date, would have pulled only the top records for each mutual fund and with the nav fetched, would have gone ahead with the calculations.
SELECT
a.mf_id,
a.mf_name,
MAX(a.load_date) As max_load_date
FROM
mutual_funds a
GROUP BY a.mf_id,
a.mf_name
Overall Query (joins unit level with derived table aggregate on mf_id and load_data)
SELECT
b.owner,
a.mf_name,
(b.purchase_price) as purchase_price,
(b.units*a.mf_nav - b.purchase_price) as profit
FROM
mutual_funds a
INNER JOIN
mf_purchase_summary b
ON a.mf_id=b.mf_id
INNER JOIN
(SELECT
a.mf_id,
a.mf_name,
MAX(a.load_date) As max_load_date
FROM
mutual_funds a
GROUP BY a.mf_id,
a.mf_name) As agg
ON agg.mf_id = a.mf_id
AND agg.max_load_date = a.load_date
So the final query that worked for me is:
SELECT
b.owner,
a.mf_name,
(b.purchase_price) as purchase_price,
(b.units*a.mf_nav - b.purchase_price) as profit
FROM
mutual_funds a
INNER JOIN
mf_purchase_summary b
ON
a.mf_id=b.mf_id
INNER JOIN
(SELECT
mf_id,
mf_name,
MAX(load_dt) as max_load_date
FROM
mutual_funds
GROUP BY mf_id) c
ON
a.mf_id = c.mf_id
AND
c.max_load_date = a.load_dt
However, I'd continue to improve it. I'd want to incorporate the recommendation made by Parfait and shall update the answer, once done.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.