简体   繁体   English

select如何为每个ID记录最近的记录

[英]How to select most recent record for each ID

I am going to financial data of peers of a company.我要去一家公司的同行的财务数据。 I have 10 peers for a particular company and the financial data is captured at regular intervals (monthly, quarterly, etc).我有一个特定公司的 10 个同行,并且定期(每月、每季度等)捕获财务数据。 However since the data capturing does not happen for all together I end up having different most recent update date.然而,由于数据捕获不会同时发生,我最终有不同的最近更新日期。

What I want to do is to select most recent row for each peer company ID in a way that I end up having only 11 rows in my table ie (1 for my company and 10 peers)我想做的是 select 每个同行公司 ID 的最新行,最终我的表中只有 11 行,即(我公司 1 行,同行 10 行)

Below is the code that I am running as of now以下是我目前正在运行的代码

select * from Financials_table

where PRD_END_DT = (select max(PRD_END_DT) from Financials_table ) -- Selecting the latest period end date
''')
peers_df.createOrReplaceTempView('peers_df')
print(shape('peers_df'))
head('peers_df', 50)

Note that I have a list of peers stored in peers_list and I'd like to get the most recent PRD_END_DT for each of the peers.请注意,我在 peers_list 中存储了一个对等点列表,我想为每个对等点获取最新的 PRD_END_DT。 Now what I am running returns the most recent PRD_END_DT value but not all peers have data as on that date.现在我正在运行的返回最新的 PRD_END_DT 值,但并非所有同行都有该日期的数据。

There are several ways to get the most recent row per company ID.有几种方法可以获取每个公司 ID 的最新行。 You haven't tagged your request with your DBMS, so some methods may work for you while others may not yet be supported by your DBMS.您尚未使用 DBMS 标记您的请求,因此某些方法可能适合您,而其他方法可能还不受您的 DBMS 支持。 Here are some options:以下是一些选项:

Get the maximum prd_end_dt per company_id.获取每个 company_id 的最大 prd_end_dt。 Then select the matching rows:然后 select 匹配的行:

select *
from table 
where (company_id, prd_end_dt) in
(
  select company_id, max(prd_end_dt)
  from financials_table
  group by company_id
)
order by company_id;

Select the rows for which no newer prd_end_dt exists for the company_id: Select company_id 不存在更新的 prd_end_dt 的行:

select *
from financials_table ft
where not exists
(
  select null
  from financials_table newer
  where newer.company_id = ft.company_id
  and newer.prd_end_dt > ft.prd_end_dt
)
order by company_id;

Get the maximum prd_end_dt on-the-fly.即时获取最大 prd_end_dt。 Then compare the dates:然后比较日期:

select *
from
(
  select ft.*, max(prd_end_dt) over (partition by company_id) as max_prd_end_dt
  from financials_table ft
  group by company_id
) with_max_prd_end_dt
where prd_end_dt = max_prd_end_dt
order by company_id;

Rank each company's rows per date and only keep the newest:按日期对每家公司的行进行排名,只保留最新的:

select *
from financials_table
order by rank() over (partition by company_id order by prd_end_dt desc)
fetch first row with ties;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM