简体   繁体   中英

Select where multiple fields are not in subquery (excluding join)

I have a requirement to pull records, that do not have history in an archive table. 2 Fields of 1 record need to be checked for in the archive.

In technical sense my requirement is a left join where right side is 'null' (aka an excluding join), which in abap openSQL is commonly implemented like this (for my scenario anyways):

Select * from xxxx            //xxxx is a result for a multiple table join
where xxxx~key not in         (select key from archive_table where [conditions] ) 
  and xxxx~foreign_key not in (select key from archive_table where [conditions] )

Those 2 fields are also checked against 2 more tables, so that would mean a total of 6 subqueries.

Database engines that I have worked with previously usually had some methods to deal with such problems (such as excluding join or outer apply).

For this particular case I will be trying to use ABAP logic with 'for all entries', but I would still like to know if it is possible to use results of a sub-query to check more than than 1 field or use another form of excluding join logic on multiple fields using SQL (without involving application server).

I have tested quite a few variations of sub-queries in the life-cycle of the program I was making. NOT EXISTS with multiple field check (shortened example below) to exclude based on 2 keys works in certain cases. Performance acceptable (processing time is about 5 seconds), although, it's noticeably slower than the same query when excluding based on 1 field.

Select * from xxxx            //xxxx is a result for a multiple table inner joins and 1 left join ( 1-* relation )
where NOT EXISTS  (
   select key from archive_table 
   where key = xxxx~key OR key = XXXX-foreign_key 
) 

EDIT: With changing requirements (for more filtering) a lot has changed, so I figured I would update this. The construct I marked as XXXX in my example contained a single left join ( where main to secondary table relation is 1-* ) and it appeared relatively fast.

This is where context becomes helpful for understanding the problem:

  • Initial requirement: pull all vendors , without financial records in 3 tables.
  • Additional requirements: also exclude based on alternative payers ( 1-* relationship). This is what example above is based on.
  • More requirements: also exclude based on alternative payee ( *-* relationship between payer and payee ).

Many-to-many join exponentially increased the record count within the construct I labeled XXXX , which in turn produces a lot of unnecessary work. For instance: a single customer with 3 payers , and 3 payees produced 9 rows, with a total of 27 fields to check (3 per row), when in reality there are only 7 unique values.

At this point, moving left-joined tables from main query into sub-queries and splitting them gave significantly better performance. than any smarter looking alternatives.

select * from lfa1 inner join lfb1 
       where 
          ( lfa1~lifnr not in ( select lifnr from bsik where bsik~lifnr = lfa1~lifnr )
       and lfa1~lifnr not in ( select wyt3~lifnr from wyt3 inner join t024e on wyt3~ekorg = t024e~ekorg and wyt3~lifnr <> wyt3~lifn2
                                                        inner join bsik  on bsik~lifnr = wyt3~lifn2 where wyt3~lifnr = lfa1~lifnr and t024e~bukrs = lfb1~bukrs  )
       and lfa1~lifnr not in ( select lfza~lifnr from lfza inner join bsik  on bsik~lifnr = lfza~empfk where lfza~lifnr = lfa1~lifnr )
          )
           and [3 more sets of sub queries like the 3 above, just checking different tables].

My Conclusion :

  • When exclusion is based on a single field, both not in / not exits work. One might be better than the other, depending on filters you use.
  • When exclusion is based on 2 or more fields and you don't have many-to-many join in main query, not exists ( select .. from table where id = a.id or id = b.id or... ) appears to be the best.
  • The moment your exclusion criteria implements a many-to-many relationship within your main query, I would recommend looking for an optimal way to implement multiple sub-queries instead (even having a sub-query for each key-table combination will perform better than a many-to-many join with 1 good sub-query, that looks good).

Anyways, any additional insight into this is welcome.

EDIT2 : Although it's slightly off topic, given how my question was about sub-queries, I figured I would post an update. After over a year I had to revisit the solution I worked on to expand it. I learned that proper excluding join works . I just failed horribly at implementing it the first time.

select header~key 
from headers left join items on headers~key = items~key
where items~key is null

if it is possible to use results of a sub-query to check more than than 1 field or use another form of excluding join logic on multiple fields

No, it is not possible to check two columns in subquery, as SAP Help clearly says:

The clauses in the subquery subquery_clauses must constitute a scalar subquery.

Scalar is keyword here, ie it should return exactly one column.

Your subquery can have multi-column key, and such syntax is completely legit:

SELECT  planetype, seatsmax
  FROM  saplane AS plane
 WHERE seatsmax < @wa-seatsmax AND
       seatsmax >= ALL ( SELECT  seatsocc
                           FROM  sflight
                           WHERE carrid = @wa-carrid AND
                                 connid = @wa-connid     )

however you say that these two fields should be checked against different tables

Those 2 fields are also checked against two more tables

so it's not the case for you. Your only choice seems to be multi-join.

PS FOR ALL ENTRIES does not support negation logic, you cannot just use some sort of NOT IN FOR ALL ENTRIES , it won't be that easy.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM