简体   繁体   中英

SQL for search query with multiple table join

I have following tables

document

  • docid (PK)
  • url

wdata

  • wordid (PK)
  • word

wtitle

  • wordid
  • docid

(wordid & docid combined unique)

wurl

  • wordid
  • docid

(wordid & docid combined unique)

For searching any phrase I break it into words and get its wordid . Tables wtitle , wurl are to be used in scoring of rows for ranking. I intend to add more tables for scoring like inlink , inh1 tag etc. However I am having problem framing my sql query for search words.

My SQL query is like

SELECT d.docid,furl,IF(t.wordid IS NULL,0,1) AS intitle,IF(u.wordid IS NULL,0,1) AS inurl FROM document d
LEFT JOIN wtitle t ON t.docid=d.docid
LEFT JOIN wdata w ON w.wordid=t.wordid
LEFT JOIN wurl u ON u.wordid=w.wordid AND u.docid=d.docid
WHERE w.wordid IN (wordid1,wordid2,wordid3)

I have following doubts

  1. How to check each table wtitle and wurl both or even more as presently it is searching 1st in wtitle because of LEFT JOIN and other joins are ignored ?
  2. How to properly frame this SQL Query ?

    SQL FIddle http://sqlfiddle.com/#!9/ab0052/4/0

Wordid 3 is in URL but not in title of Docid 2

Wordid 3 is not in URL but in title of Docid 3

I want to return both doc 2 and 3 however because it joins by wtitle first it is ignoring (using 1st query data) other joins

If you want to know whether, say, two word you are looking for occur both in a document, you must look at title and url combined. (Otherwise if you knew that one of the words existed in title and one word existed in url, you wouldn't know if it's the same word or both words.) So combine both tables with UNION ALL first, but remember which record belongs to which table. Then we can count combined and per place (title or url).

Here is a query that looks for word IDs 3 and 4. It lists the entries matching both words first followed by documents with only one of the words matching:

SELECT 
  d.docid, 
  d.furl, 
  w.cnt_combined,
  w.cnt_in_title,
  w.cnt_in_url
FROM document d
JOIN
(
  select
    docid,
    count(distinct wordid) cnt_combined,
    sum(place = 'TITLE') cnt_in_title,
    sum(place = 'URL') cnt_in_url
  from
  (
    select 'TITLE' as place, docid, wordid from wtitle where wordid in (3,4)
    union all
    select 'URL' as place, docid, wordid from wurl where wordid in (3,4)
  ) both_tables
  group by docid
) w ON w.docid = d.docid
order by w.cnt_combined desc;

You can look for words instead of word IDs by replacing

where wordid in (3,4)

by

where wordid in (select wordid from wdata where word in ('vaccination', 'the'))

Rextester link: http://rextester.com/KPVX67861 (SQL fiddle doesn't work for me most of the time.)

I suggest these covering indexes:

CREATE INDEX idx_wtitle ON wtitle(wordid, docid);
CREATE INDEX idx_wurl ON wurl(wordid, docid);

With wordid first the DBMS can find the entries easily, and as docid is already in the indexes, the DBMS doesn't have to access the table. It gets all data from the indexes. (This is why they are called covering indexes; they cover all columns the query wants.)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM