[英]SQL Query Formation vs SAS
A quick background: I am trying to revisit SQL after a really long time (about 13 years). 快速背景:我试图在很长一段时间(大约13年)之后重新访问SQL。 All this while, I've been working on SAS.
在这期间,我一直在研究SAS。 While there is a SQL procedure within SAS and I use it quite often but now that I am working on open source stuff, I am realizing that SAS's contructs of SQL have been highly personalized.
尽管SAS中有一个SQL过程,并且我经常使用它,但是现在我正在研究开放源代码,但我意识到SAS的SQL结构已高度个性化。 It was comparatively lot simpler to write queries in SAS but not as much in pure SQL (MariaDB).
用SAS编写查询相对要简单得多,而用纯SQL(MariaDB)则不那么容易。 I can very well account that to my lack of knowledge in SQL.
我可以很好地解释这是由于我缺乏SQL知识。
Problem: I've been trying to create a personal finance management dashboard (while simultaneously trying to learn python/MySQL/PHP). 问题:我一直在尝试创建个人理财管理仪表板(同时尝试学习python / MySQL / PHP)。 Talking only in conext of the problem, I've created two tables:
仅讨论问题,我创建了两个表:
Table 1 (mutual_fund_all) - one which contains the information about mutual fund ID, current NAV. 表1 (mutual_fund_all)-包含有关共同基金ID(当前资产净值)的信息。 This table gets the updated NAV information appended automatically, for all the mutual funds.
该表自动获取所有共同基金的更新净资产值信息。 So I've like 10000 mutual funds and their time series nav data.
所以我喜欢10000个共同基金及其时间序列的导航数据。
Table 2 (owned mutual funds) - this one contains all the funds I've purchased, the price at which I bought them and the total number of units owned. 表2 (自有共同基金)-此表包含我已购买的所有资金,我购买它们的价格以及所拥有单位的总数。
Now I want to use these tables to be merged in a way that I am able to see the mutual fund name, purchase cost and the profit - for each of the funds. 现在,我想使用这些表格进行合并,以便能够看到每种基金的共同基金名称,购买成本和利润。
In SAS, I'd would have created a couple of temp tables and then finally would have merged the needed info to get the required info. 在SAS中,我将创建几个临时表,然后最终将合并所需的信息以获取所需的信息。 In SQL, I am not sure if I can do similarly and my little knowledge somewhat is forcing me to everything in one single query.
在SQL中,我不确定是否可以做类似的事情,而我的一点知识还是迫使我在一个查询中进行所有操作。 Since I am stuck, I need your help.
由于我陷入困境,因此需要您的帮助。
Here is what I've written: 这是我写的:
SELECT
b.owner,
a.mf_name,
(b.purchase_price) as purchase_price,
(b.units*a.mf_nav - b.purchase_price) as profit
FROM
mutual_funds a,
mf_purchase_summary b
where
a.mf_id=b.mf_id
group BY
b.owner, a.mf_name
This one is somewhat working but it's not giving me the correct info since it's probably pulling wrong NAV entry from the mutual_fund table. 这有点奏效,但它没有给我正确的信息,因为它可能是从common_fund表中提取了错误的NAV条目。 I need only the latest available NAV (have a load_date field in the table and I just want to use the nav from the record where load_date is max).
我只需要最新的可用NAV(表中有一个load_date字段,而我只想使用load_date为max的记录中的导航)。 I am just not being able to do it in SQL.
我只是无法在SQL中做到这一点。
In SAS, in the first step I'd have first fetched only the owned mutual fund records from the mutual_funds table. 在SAS中,第一步,我首先要从common_funds表中仅获取拥有的共同基金记录。 Then in the second step I would have sorted those filtered records with the descending load_date, would have pulled only the top records for each mutual fund and with the nav fetched, would have gone ahead with the calculations.
然后,在第二步中,我将使用load_date降序对这些过滤的记录进行排序,将只提取每个共同基金的最高记录,并且将其资产净值提取出来,将继续进行计算。
Can I do something similar things in SQL? 我可以在SQL中做类似的事情吗? It'd greatly simply my effort (and would also make the overall code more readable/ segmented).
这将极大地简化我的工作(并且还将使整体代码更具可读性/更可细分)。
SELECT
b.owner,
a.mf_name,
SUM( b.purchase_price ) as purchase_price,
SUM( b.units*a.mf_nav - b.purchase_price ) as profit
FROM
mutual_funds a
INNER JOIN
mf_purchase_summary b
ON
a.mf_id=b.mf_id
GROUP BY
b.owner, a.mf_name;
Consider joining by an aggregate derived table. 考虑通过汇总派生表进行联接。 Here, I try to break down your SAS steps.
在这里,我尝试分解您的SAS步骤。 Overall solution should be fully compliant in SAS's
proc sql
and any other ANSI-compliant SQL dialect. 总体解决方案应完全符合SAS的
proc sql
和任何其他符合ANSI的SQL方言。
Unit Level Join (using explicit join) 单元级联接 (使用显式 联接 )
In SAS, in the first step I'd have first fetched only the owned mutual fund records from the mutual_funds table. 在SAS中,第一步,我首先要从common_funds表中仅获取拥有的共同基金记录。
SELECT
b.owner,
a.mf_name,
(b.purchase_price) as purchase_price,
(b.units*a.mf_nav - b.purchase_price) as profit
FROM
mutual_funds a
INNER JOIN
mf_purchase_summary b
ON a.mf_id=b.mf_id
Aggregate Level 总体水平
Then in the second step I would have sorted those filtered records with the descending load_date, would have pulled only the top records for each mutual fund and with the nav fetched, would have gone ahead with the calculations. 然后,在第二步中,我将使用load_date降序对这些过滤的记录进行排序,将只提取每个共同基金的最高记录,并且将其资产净值提取出来,将继续进行计算。
SELECT
a.mf_id,
a.mf_name,
MAX(a.load_date) As max_load_date
FROM
mutual_funds a
GROUP BY a.mf_id,
a.mf_name
Overall Query (joins unit level with derived table aggregate on mf_id and load_data) 整体查询 (将单元级别与mf_id和load_data上的派生表聚合在一起)
SELECT
b.owner,
a.mf_name,
(b.purchase_price) as purchase_price,
(b.units*a.mf_nav - b.purchase_price) as profit
FROM
mutual_funds a
INNER JOIN
mf_purchase_summary b
ON a.mf_id=b.mf_id
INNER JOIN
(SELECT
a.mf_id,
a.mf_name,
MAX(a.load_date) As max_load_date
FROM
mutual_funds a
GROUP BY a.mf_id,
a.mf_name) As agg
ON agg.mf_id = a.mf_id
AND agg.max_load_date = a.load_date
So the final query that worked for me is: 因此,对我有用的最终查询是:
SELECT
b.owner,
a.mf_name,
(b.purchase_price) as purchase_price,
(b.units*a.mf_nav - b.purchase_price) as profit
FROM
mutual_funds a
INNER JOIN
mf_purchase_summary b
ON
a.mf_id=b.mf_id
INNER JOIN
(SELECT
mf_id,
mf_name,
MAX(load_dt) as max_load_date
FROM
mutual_funds
GROUP BY mf_id) c
ON
a.mf_id = c.mf_id
AND
c.max_load_date = a.load_dt
However, I'd continue to improve it. 但是,我会继续改进它。 I'd want to incorporate the recommendation made by Parfait and shall update the answer, once done.
我想纳入Parfait的建议,并在完成后更新答案。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.