简体   繁体   English

我将如何编写此SQL查询?

[英]How would I write this SQL query?

I have the following tables: 我有以下表格:

PERSON_T              DISEASE_T               DRUG_T
=========             ==========              ========
PERSON_ID             DISEASE_ID              DRUG_ID
GENDER                PERSON_ID               PERSON_ID
NAME                  DISEASE_START_DATE      DRUG_START_DATE
                      DISEASE_END_DATE        DRUG_END_DATE

I want to write a query that takes an input of a disease id and returns one row for each person in the database with a column for the gender, a column for whether or not they have ever had the disease, and a column for each drug which specifies if they took the drug before contracting the disease. 我想编写一个查询,输入一个疾病ID,并为数据库中的每个人返回一行,其中一列是性别,一列是他们是否曾经患过该病,一列是每种药物其中指明了他们在感染该疾病之前是否服用了该药物。 IE true would mean drug_start_date < disease_start_date. IE true表示drug_start_date <disease_start_date。 False would mean drug_start_date>disease_start_date or the person never took that particular drug. False表示Drug_start_date> disease_start_date或此人从未服用过该特定药物。

We currently pull all of the data from the database and use Java to create a 2D array with all of these values. 当前,我们从数据库中提取所有数据,并使用Java创建具有所有这些值的2D数组。 We are investigating moving this logic into the database. 我们正在研究将此逻辑移入数据库。 Is it possible to create a query that will return the result set as I want it or would I have to create a stored procedure? 是否可以创建一个查询,该查询将返回我想要的结果集,还是必须创建一个存储过程? We are using Postgres, but I assume an SQL answer for another database will easily translate to Postgres. 我们使用的是Postgres,但是我认为另一个数据库的SQL答案很容易转换为Postgres。

Based on the info provided: 根据提供的信息:

   SELECT p.name,
          p.gender,
          CASE WHEN d.disease_id IS NULL THEN 'N' ELSE 'Y' END AS had_disease,
          dt.drug_id
     FROM PERSON p
LEFT JOIN DISEASE d ON d.person_id = p.person_id
                   AND d.disease_id = ?
LEFT JOIN DRUG_T dt ON dt.person_id = p.person_id
                   AND dt.drug_start_date < d.disease_start_date

..but there's going to be a lot of rows that will look duplicate except for the drug_id column. ..但是除了drug_id列外,会有很多行看起来都是重复的。

You're essentially looking to create a cross-tab query with the drugs. 您实质上是在寻找使用药物创建交叉表查询的方法。 While there are plenty of OLAP tools out there that can do this sort of thing (among all sorts of other slicing and dicing of the data), doing something like this in traditional SQL is not easy (and, in general, impossible to do without some sort of procedural syntax in all but the simplest scenarios). 尽管有很多OLAP工具可以做到这一点(在其他各种数据切片和切分中),但是在传统SQL中做这样的事情并不容易(通常,如果没有这些工具,就不可能做到)除了最简单的方案外,其他所有方法都具有某种程序语法)。

You essentially have two options when doing this with SQL (well, more accurately, you have one option, and another more complicated but flexible option that derives from it): 使用SQL进行此操作时,基本上有两个选项(更准确地说,您有一个选项,以及从中派生的另一个更复杂但灵活的选项):

  1. Use a series of CASE statements in your query to produce columns that are representative of each individual drug. 在查询中使用一系列CASE语句来生成代表每种药物的列。 This requires knowing the list of variable values (ie drugs) ahead of time 这需要提前知道变量值列表(即药物)
  2. Use a procedural SQL language, such as T-SQL, to dynamically construct a query that uses case statements as described above, but along with obtaining that list of values from the data itself. 使用过程SQL语言(例如T-SQL)来动态构造一个查询,该查询如上所述使用case语句,但还要从数据本身获取值列表。

The two options essentially do the same thing, you're just trading simplicity and ease of maintenance for flexibility in the second option. 这两个选项实质上具有相同的作用,您只是在第二个选项中牺牲了简便性和易于维护性来换取灵活性。

For example, using option 1: 例如,使用选项1:

select
    p.NAME,
    p.GENDER,
    (case when d.DISEASE_ID is null then 0 else 1 end) as HAD_DISEASE,
    (case when sum(case when dr.DRUG_ID = 1 then 1 else 0 end) > 0 then 1 else 0 end) as TOOK_DRUG_1,
    (case when sum(case when dr.DRUG_ID = 2 then 1 else 0 end) > 0 then 1 else 0 end) as TOOK_DRUG_2,
    (case when sum(case when dr.DRUG_ID = 3 then 1 else 0 end) > 0 then 1 else 0 end) as TOOK_DRUG_3

from PERSON_T p

left join DISEASE_T d on d.PERSON_ID = p.PERSON_ID and d.DISEASE_ID = @DiseaseId
left join DRUG_T dr on dr.PERSON_ID = p.PERSON_ID and dr.DRUG_START_DATE < d.DISEASE_START_DATE

group by p.PERSON_ID, p.NAME, p.GENDER, d.DISEASE_ID

As you can tell, this gets a little laborious as you get outside of just a few potential values. 如您所知,这超出了一些潜在值,因此会有些费力。

The other option is to construct this query dynamically. 另一个选择是动态构造此查询。 I don't know PostgreSQL and what, if any, procedural capabilities it has, but the overall procedure would be this: 我不知道PostgreSQL及其具有的程序功能(如果有),但是总体过程如下:

  1. Gather list of potential DRUG_ID values along with names for the columns 收集潜在的DRUG_ID值列表以及各列的名称
  2. Prepare three string values: the SQL prefix (everything before the first drug-related CASE statement, the SQL stuffix (everything after the last drug-related CASE statement), and the dynamic portion 准备三个字符串值:SQL前缀(第一个与药物相关的CASE语句之前的所有内容,SQL填充文本(最后一个与药物相关的CASE语句之后的所有内容)和动态部分
  3. Construct the dynamic portion by combining drug CASE statements based upon the previously retrieved list 通过基于先前检索到的列表组合药物CASE语句来构建动态部分
  4. Combine them into a single (hopefully valid) SQL statement and execute 将它们组合成单个(希望有效)的SQL语句并执行

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM