简体   繁体   中英

postgreSQL nested query performing slow

I have these three tables:

  1. create table words (id integer, word text, freq integer);
  2. create table sentences (id integer, sentence text);
  3. create table index (wordId integer, sentenceId integer, position integer);

Index is a inverted index and denotes which word occurs in which sentence. Furthermoore I have an index on id from the table words and sentences.

This query determines in which sentences a given word occurs and returns the first match:

select S.sentence from sentences S, words W, index I
where W.word = '#erhoehungen' and W.id = I.wordId and S.id = I.sentenceId
limit 1;

But when I want to retrieve a sentence where two words occur together like:

select S.sentence from sentences S, words W, index I
where W.word = '#dreikampf' and I.wordId = W.id and S.id = I.sentenceId and
S.id in (
    select S.id from sentences S, words W, index I
    where W.word = 'bruederle' and W.id = I.wordId and S.id = I.sentenceId
)
limit 1;

This query is much slower. Is there any trick to speed it up? Following things I did so far:

  • increased shared_buffer to 32MB
  • increased work_mem to 15MB
  • ran analyze on all tables
  • as mentioned created index on words id and sentences id

Regards.

€Dit:

Here is the output of the explain analyze query statement: http://pastebin.com/t2M5w4na

These three create statements are actually my original create statements. Should I add primary key to the tables sentences and words and reference these as foreign keys in the index? But what primary key should I use for the index table? SentId and wordId together are not unique and even if I add pos which denotes the position of the word in the sentence it is not unique.

updated to:

  1. create table words (id integer, word text, freq integer, primary key(id));
  2. create table sentences (id integer, sentence text, primary key(id));
  3. create table index (wordId integer, sentenceId integer, position integer, foreign key(wordId) references words(id), foreign key(sentenceId) references sentences(sentenceId));

I guess this should be more efficient:

SELECT s.id, s.sentence FROM words w
JOIN INDEX i ON w.id = i.wordId
JOIN sentences s ON i.sentenceId = s.id
WHERE w.word IN ('#dreikampf', 'bruederle')
GROUP BY s.id, s.sentence
HAVING COUNT(*) >= 2

Just make sure the amount of items in the IN clause matches the amount of items in the HAVING clause.

Fiddle here .

Looks like you don't have indexes on columns wordId , sentenceId . Please create them and query will work much faster.

CREATE INDEX idx_index_wordId ON index USING btree (wordId);
CREATE INDEX idx_index_sentenceId ON index USING btree (sentenceId);

Using reserved word index as table name is not a good idea – you may need to escape it in some cases. Probably you should also add column id to index table and make it primary key.

Please use Mosty Mostacho query and show it's explain analyze output after you make indexes. May be it can work even faster.

Update:

please try new query:

select S.sentence from sentences S where S.id in
(select sentenceId from index I where 
I.wordId in (select id from words where word IN ('#dreikampf', 'bruederle'))
group by I.sentenceId
having count(distinct I.wordId) = 2
limit 1)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM