[英]Python: use one sqlite query to find the NOT EXISTS result
I have a dataset of million entries, its comprised of songs and their artists.我有一个包含数百万个条目的数据集,其中包含歌曲及其艺术家。
I have我有
a track_id
an artist_id.
There are 3 tables有3张桌子
tracks (track_id, title, artist_id),
artists(artist_id and artist_name) and
artist_term (artist_id and term).
Using only one query, I have to count the number of tracks whose artists don't have any linked terms.只使用一个查询,我必须计算其艺术家没有任何链接术语的曲目数量。
For more reference, the schema of the DB is as follows:为了更多参考,DB的架构如下:
CREATE TABLE tracks (track_id text PRIMARY KEY, title text, release text, year int, duration real, artist_id text);
CREATE TABLE artists (artist_id text, artist_name text);
CREATE TABLE artist_term (artist_id text, term text, FOREIGN KEY(artist_id)
REFERENCES artists(artist_id));
How do I get to the solution?我如何获得解决方案? please help!
请帮忙!
You can use not exists
:您可以使用
not exists
:
select count(*) cnt
from tracks t
where not exists (select 1 from artist_term at where at.artist_id = t.artist_id)
As far as concerns you do not need to bring in the artists
table since artist_id
is available in both tracks
and artist_term
tables.就您不需要引入
artists
表而言,因为artist_id
在tracks
和artist_term
表中都可用。
For performance you want an index on tracks(artist_id)
and another one on artist_term(artist_id)
.对于性能,您需要一个关于
tracks(artist_id)
的索引和另一个关于artist_term(artist_id)
的索引。
An anti- left join
would also get the job done:反
left join
也可以完成工作:
select count(*) cnt
from tracks t
left join artist_term at on at.artist_id = t.artist_id
where at.artist_id is null
You can join the tables tracks
and artists
and left join the table artist_term
so to find the unmatched artist_id
s:您可以加入表
tracks
和artists
并离开加入表artist_term
以便找到不匹配的artist_id
s:
select count(distinct t.track_id)
from tracks t
inner join artists a on a.artist_id = t.artist_id
left join artist_term at on at.artist_id = a.artist_id
where at.artist_id is null
The condition at.artist_id is null
in the WHERE
clause will return only the unmatched rows which will be counted. WHERE
子句中的条件at.artist_id is null
将仅返回将被计数的不匹配行。
If I'm not mistaken, such a query could be built in a similar fashion like its sibling SQL languages.如果我没记错的话,这样的查询可以像它的兄弟 SQL 语言一样以类似的方式构建。 If so, it should look something like this:
如果是这样,它应该是这样的:
SELECT COUNT(track_id)
FROM tracks as t
WHERE EXISTS (
SELECT *
FROM artists AS a
WHERE a.artist_id = t.artist_id
AND NOT EXISTS(
SELECT *
FROM artist_term as at
WHERE at.artist_id = a.artist_id
)
)
So this query basically says: count the number of different tracks (marked by their unique track_id
), where there is an artist that has the same artist_id
, where no artist_term
exists that refers to the artist_id
of the artist.因此,这个查询基本上说:算上不同的磁道数(通过其独特的标记
track_id
),那里是具有相同的艺术家artist_id
,在没有artist_term
存在指artist_id
的艺术家。
Hope this helps!希望这可以帮助!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.