简体   繁体   English

Mysql:查询字段的最小值和最大值之间的缺失行

[英]Mysql: query missing rows between min and max of a field

I am working with a parts / motorcycle fitment Mysql database where all parts are linked to all motorcycles they can be installed on.我正在使用零件/摩托车配件 Mysql 数据库,其中所有零件都链接到它们可以安装的所有摩托车。 It looks like this:它看起来像这样:

part_number motorcycle      year
1000        HONDA_CBR1000   2008
1000        HONDA_CBR1000   2009
1000        HONDA_CBR1000   2010
1000        HONDA_CBR1000   2011
1000        HONDA_CBR1000   2012
1000        HONDA_CBR1000   2013
1001        HONDA_CBR600    2008
1001        HONDA_CBR600    2009
1001        HONDA_CBR1000   2008
1001        HONDA_CBR1000   2009
1001        HONDA_CBR1000   2013

So it means that:所以这意味着:

  • part #1000 can be installed on the Honda CBR1000 from 2008 to 2013从 2008 年到 2013 年,部件 #1000 可以安装在本田 CBR1000 上
  • part #1001 can be installed on the Honda CBR600 from 2008 to 2009 AND on the Honda CBR1000 from 2008 to 2013. 2008 年至 2009 年本田 CBR600 和 2008 年至 2013 年本田 CBR1000 可安装部件 #1001。

Unfortunately, the table (which has ~650,000 rows) was not always filled correctly.不幸的是,表格(大约有 650,000 行)并不总是正确填充。 In this example, you will notice the following lines are missing:在此示例中,您会注意到缺少以下几行:

part_number motorcycle      year
1001        HONDA_CBR1000   2010
1001        HONDA_CBR1000   2011
1001        HONDA_CBR1000   2012

because the part #1001 which can be installed on the HONDA_CBR1000 from 2008, 2009 and 2013 can also be installed in the "forgotten" years in between (2010, 2011 and 2012).因为2008年、2009年和2013年可以安装在HONDA_CBR1000上的部件#1001也可以安装在“被遗忘”的年份之间(2010年、2011年和2012年)。

So the simple query:所以简单的查询:

SELECT * FROM mytable WHERE motorcycle = 'HONDA_CBR1000' AND year = '2011'

would only retrieve the row for part #1000 (while in reality, part #1001 is also installable on this bike).只会检索零件 #1000 的行(而实际上,零件 #1001 也可以安装在这辆自行车上)。

in plain English, I guess a query like用简单的英语,我猜一个查询像

SELECT * FROM mytable WHERE motorcycle = 'HONDA_CBR1000'
AND ("minimum year of part_number applicable to HONDA_CBR1000" <= '2011')
AND ("maximum year of part_number applicable to HONDA_CBR1000" >= '2011')

would retrieve all results (1000 and 1001).将检索所有结果(1000 和 1001)。

But how can I ask that in SQL?但是我怎么能在 SQL 中问呢? Do you think it would too slow?你觉得会不会太慢?

Thanks for any help!谢谢你的帮助!

SELECT part_number, max(year), Min(year) 
  FROM mytable 
WHERE motorcycle = 'HONDA_CBR1000'
Group By part_number
Having Min(year) <= 2011
  And max(year) >= 2011

*********************Edit**************** *********************编辑****************

To improve performance, Lets try this, 1)为了提高性能,让我们试试这个,1)

 SELECT part_number 
          FROM mytable t, 
               (Select part_number, Min(year) Minyear, max(year) Maxyear
                  FROM mytable
                Group BY part_number) t1
        WHERE t.motorcycle = 'HONDA_CBR1000'
           AND t.year Between MinYear and Maxyear
           AND t.year = '2011'

*********************EDIT 2********************************** ******************** 编辑 2 *************************** *******

So This is the query that will list out the years that are missed out.所以这是将列出错过的年份的查询。 You can put the entire query in to a insert statement您可以将整个查询放入插入语句中

SELECT partsnumber ,  yrs.allyears
  FROM (Select max(year) maxyear, min(year) minyear, partsnumber
          FROM yourtable
      group by partsnumber) q1   
        (Select 1950+1+b+a*10 as allyears
           from (select 0 as a union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) a,
                 (select 0 as b union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) b) y
    Where yrs.allyears between maxyear and minyear

MINUS
SELECT partsnumber ,  yrs.allyears
 From yourtable

yrs --> Subquery that generates years from 1950 to 2050 (If you have more years ( beyond 2050 or before 1950 ) then this has to be changed) yrs --> 生成从 1950 年到 2050 年的子查询(如果您有更多年(超过 2050 年或 1950 年之前),则必须更改此值)

Am selecting the years between the min and max years for each productnumber.我正在为每个产品编号选择最小和最大年份之间的年份。 then with yrs table as reference am finding the years between min and max years.然后以 yrs 表作为参考,找到最小和最大年份之间的年份。

The result from above query will give all years between min and max.上述查询的结果将给出最小值和最大值之间的所有年份。 The minus will give the years that are missed减号将给出错过的年份

Here is my approach for getting all combinations of parts and motorcycles and the years they have no data.这是我获取零件和摩托车的所有组合以及它们没有数据的年份的方法。

Generate all the rows for all the years, then filter out the ones you have.生成所有年份的所有行,然后过滤掉您拥有的行。 The first part uses cross join .第一部分使用cross join The second left join :第二个left join

select pm.part_number, pm.motorcycle, y.year
from (select part_number, motorcycle, min(year) as miny, max(year) as maxy
      from mytable
      group by part_number, motorcycle
     ) pm cross join
     (select distinct year
      from mytable
     ) y
     on y.year between pm.miny and pm.maxy left join
     mytable t
     on t.part_number = pm.part_number and t.motorcycle = pm.motorcycle and
        t.year = y.year
where y.year is null;

This assumes that all years are in your table, somewhere.这假设所有年份都在您的表中,某处。 The y table is just a list of years, so you can get it from another table or by creating a derived table. y表只是年份列表,因此您可以从另一个表或通过创建派生表获取它。 The subquery is just a convenient way to get it.子查询只是一种方便的获取方式。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM