简体   繁体   English

Python:使用一个 sqlite 查询来查找 NOT EXISTS 结果

[英]Python: use one sqlite query to find the NOT EXISTS result

I have a dataset of million entries, its comprised of songs and their artists.我有一个包含数百万个条目的数据集,其中包含歌曲及其艺术家。

I have我有

a track_id
an artist_id.

There are 3 tables有3张桌子

tracks (track_id, title, artist_id),
artists(artist_id and artist_name) and
artist_term (artist_id and term).

Using only one query, I have to count the number of tracks whose artists don't have any linked terms.只使用一个查询,我必须计算其艺术家没有任何链接术语的曲目数量。

For more reference, the schema of the DB is as follows:为了更多参考,DB的架构如下:

CREATE TABLE tracks (track_id text PRIMARY KEY, title text, release text, year int, duration real, artist_id text);
CREATE TABLE artists (artist_id text, artist_name text);
CREATE TABLE artist_term (artist_id text, term text, FOREIGN KEY(artist_id) 
REFERENCES artists(artist_id));

How do I get to the solution?我如何获得解决方案? please help!请帮忙!

You can use not exists :您可以使用not exists

select count(*) cnt
from tracks t
where not exists (select 1 from artist_term at where at.artist_id = t.artist_id)

As far as concerns you do not need to bring in the artists table since artist_id is available in both tracks and artist_term tables.就您不需要引入artists表而言,因为artist_idtracksartist_term表中都可用。

For performance you want an index on tracks(artist_id) and another one on artist_term(artist_id) .对于性能,您需要一个关于tracks(artist_id)的索引和另一个关于artist_term(artist_id)的索引。

An anti- left join would also get the job done:left join也可以完成工作:

select count(*) cnt
from tracks t
left join artist_term at on at.artist_id = t.artist_id
where at.artist_id is null

You can join the tables tracks and artists and left join the table artist_term so to find the unmatched artist_id s:您可以加入表tracksartists并离开加入表artist_term以便找到不匹配的artist_id s:

select count(distinct t.track_id)
from tracks t
inner join artists a on a.artist_id = t.artist_id
left join artist_term at on at.artist_id = a.artist_id
where at.artist_id is null

The condition at.artist_id is null in the WHERE clause will return only the unmatched rows which will be counted. WHERE子句中的条件at.artist_id is null将仅返回将被计数的不匹配行。

If I'm not mistaken, such a query could be built in a similar fashion like its sibling SQL languages.如果我没记错的话,这样的查询可以像它的兄弟 SQL 语言一样以类似的方式构建。 If so, it should look something like this:如果是这样,它应该是这样的:

SELECT COUNT(track_id)
FROM tracks as t
WHERE EXISTS (
    SELECT *
    FROM artists AS a
    WHERE a.artist_id = t.artist_id
    AND NOT EXISTS(
        SELECT *
        FROM artist_term as at
        WHERE at.artist_id = a.artist_id
    )
)

So this query basically says: count the number of different tracks (marked by their unique track_id ), where there is an artist that has the same artist_id , where no artist_term exists that refers to the artist_id of the artist.因此,这个查询基本上说:算上不同的磁道数(通过其独特的标记track_id ),那里是具有相同的艺术家artist_id ,在没有artist_term存在指artist_id的艺术家。

Hope this helps!希望这可以帮助!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM