简体   繁体   English

SQL查询格式与SAS

[英]SQL Query Formation vs SAS

A quick background: I am trying to revisit SQL after a really long time (about 13 years). 快速背景:我试图在很长一段时间(大约13年)之后重新访问SQL。 All this while, I've been working on SAS. 在这期间,我一直在研究SAS。 While there is a SQL procedure within SAS and I use it quite often but now that I am working on open source stuff, I am realizing that SAS's contructs of SQL have been highly personalized. 尽管SAS中有一个SQL过程,并且我经常使用它,但是现在我正在研究开放源代码,但我意识到SAS的SQL结构已高度个性化。 It was comparatively lot simpler to write queries in SAS but not as much in pure SQL (MariaDB). 用SAS编写查询相对要简单得多,而用纯SQL(MariaDB)则不那么容易。 I can very well account that to my lack of knowledge in SQL. 我可以很好地解释这是由于我缺乏SQL知识。

Problem: I've been trying to create a personal finance management dashboard (while simultaneously trying to learn python/MySQL/PHP). 问题:我一直在尝试创建个人理财管理仪表板(同时尝试学习python / MySQL / PHP)。 Talking only in conext of the problem, I've created two tables: 仅讨论问题,我创建了两个表:

Table 1 (mutual_fund_all) - one which contains the information about mutual fund ID, current NAV. 表1 (mutual_fund_all)-包含有关共同基金ID(当前资产净值)的信息。 This table gets the updated NAV information appended automatically, for all the mutual funds. 该表自动获取所有共同基金的更新净资产值信息。 So I've like 10000 mutual funds and their time series nav data. 所以我喜欢10000个共同基金及其时间序列的导航数据。

Table 2 (owned mutual funds) - this one contains all the funds I've purchased, the price at which I bought them and the total number of units owned. 表2 (自有共同基金)-此包含我已购买的所有资金,我购买它们的价格以及所拥有单位的总数。

Now I want to use these tables to be merged in a way that I am able to see the mutual fund name, purchase cost and the profit - for each of the funds. 现在,我想使用这些表格进行合并,以便能够看到每种基金的共同基金名称,购买成本和利润。

In SAS, I'd would have created a couple of temp tables and then finally would have merged the needed info to get the required info. 在SAS中,我将创建几个临时表,然后最终将合并所需的信息以获取所需的信息。 In SQL, I am not sure if I can do similarly and my little knowledge somewhat is forcing me to everything in one single query. 在SQL中,我不确定是否可以做类似的事情,而我的一点知识还是迫使我在一个查询中进行所有操作。 Since I am stuck, I need your help. 由于我陷入困境,因此需要您的帮助。

Here is what I've written: 这是我写的:

SELECT
    b.owner,
    a.mf_name,
    (b.purchase_price) as purchase_price,
    (b.units*a.mf_nav - b.purchase_price) as profit

FROM    
    mutual_funds            a,
    mf_purchase_summary     b 
where 
    a.mf_id=b.mf_id
group BY    
    b.owner, a.mf_name

This one is somewhat working but it's not giving me the correct info since it's probably pulling wrong NAV entry from the mutual_fund table. 这有点奏效,但它没有给我正确的信息,因为它可能是从common_fund表中提取了错误的NAV条目。 I need only the latest available NAV (have a load_date field in the table and I just want to use the nav from the record where load_date is max). 我只需要最新的可用NAV(表中有一个load_date字段,而我只想使用load_date为max的记录中的导航)。 I am just not being able to do it in SQL. 我只是无法在SQL中做到这一点。

In SAS, in the first step I'd have first fetched only the owned mutual fund records from the mutual_funds table. 在SAS中,第一步,我首先要从common_funds表中仅获取拥有的共同基金记录。 Then in the second step I would have sorted those filtered records with the descending load_date, would have pulled only the top records for each mutual fund and with the nav fetched, would have gone ahead with the calculations. 然后,在第二步中,我将使用load_date降序对这些过滤的记录进行排序,将只提取每个共同基金的最高记录,并且将其资产净值提取出来,将继续进行计算。

Can I do something similar things in SQL? 我可以在SQL中做类似的事情吗? It'd greatly simply my effort (and would also make the overall code more readable/ segmented). 这将极大地简化我的工作(并且还将使整体代码更具可读性/更可细分)。

SELECT
       b.owner,
       a.mf_name,
       SUM( b.purchase_price ) as purchase_price,
       SUM( b.units*a.mf_nav - b.purchase_price ) as profit    
  FROM    
       mutual_funds            a
INNER JOIN
       mf_purchase_summary     b 
    ON
       a.mf_id=b.mf_id
 GROUP BY    
       b.owner, a.mf_name;

Consider joining by an aggregate derived table. 考虑通过汇总派生表进行联接。 Here, I try to break down your SAS steps. 在这里,我尝试分解您的SAS步骤。 Overall solution should be fully compliant in SAS's proc sql and any other ANSI-compliant SQL dialect. 总体解决方案应完全符合SAS​​的proc sql和任何其他符合ANSI的SQL方言。

Unit Level Join (using explicit join) 单元级联接 (使用显式 联接

In SAS, in the first step I'd have first fetched only the owned mutual fund records from the mutual_funds table. 在SAS中,第一步,我首先要从common_funds表中仅获取拥有的共同基金记录。

SELECT
    b.owner,
    a.mf_name,
    (b.purchase_price) as purchase_price,
    (b.units*a.mf_nav - b.purchase_price) as profit    
FROM    
    mutual_funds            a
INNER JOIN
    mf_purchase_summary     b 
ON a.mf_id=b.mf_id

Aggregate Level 总体水平

Then in the second step I would have sorted those filtered records with the descending load_date, would have pulled only the top records for each mutual fund and with the nav fetched, would have gone ahead with the calculations. 然后,在第二步中,我将使用load_date降序对这些过滤的记录进行排序,将只提取每个共同基金的最高记录,并且将其资产净值提取出来,将继续进行计算。

SELECT        
    a.mf_id,
    a.mf_name,
    MAX(a.load_date) As max_load_date
FROM    
    mutual_funds            a
GROUP BY a.mf_id,
         a.mf_name

Overall Query (joins unit level with derived table aggregate on mf_id and load_data) 整体查询 (将单元级别与mf_id和load_data上的派生表聚合在一起)

SELECT
    b.owner,
    a.mf_name,
    (b.purchase_price) as purchase_price,
    (b.units*a.mf_nav - b.purchase_price) as profit    
FROM    
    mutual_funds            a
INNER JOIN
    mf_purchase_summary     b 
ON a.mf_id=b.mf_id
INNER JOIN
    (SELECT        
         a.mf_id,
         a.mf_name,
         MAX(a.load_date) As max_load_date
     FROM    
         mutual_funds            a
     GROUP BY a.mf_id,
              a.mf_name) As agg
ON agg.mf_id = a.mf_id 
AND agg.max_load_date = a.load_date

So the final query that worked for me is: 因此,对我有用的最终查询是:

SELECT
    b.owner,
    a.mf_name,
    (b.purchase_price) as purchase_price,
    (b.units*a.mf_nav - b.purchase_price) as profit    
FROM    
    mutual_funds            a
INNER JOIN
    mf_purchase_summary     b 
ON 
    a.mf_id=b.mf_id
INNER JOIN
    (SELECT        
         mf_id,
         mf_name,
         MAX(load_dt) as max_load_date
     FROM    
         mutual_funds            
     GROUP BY mf_id)        c 
ON 
    a.mf_id = c.mf_id 
    AND 
    c.max_load_date = a.load_dt

However, I'd continue to improve it. 但是,我会继续改进它。 I'd want to incorporate the recommendation made by Parfait and shall update the answer, once done. 我想纳入Parfait的建议,并在完成后更新答案。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM