简体   繁体   中英

Postgres query to return records based fuzzy match with values in a set

As seen below, I have two tables, one containing information about people,and the other containing information about sports.

I want to do an query on the people table and only return records where the description contain a sport listed in the cost table. If description only contained the sport, and no other text, I could easily do this as an inner join after zapping everything into lowercase. However, I'm thinking because of the additional information in description, I might need to do something with a subquery and/or regular expression.

 name  | age |               description                
-------+-----+------------------------------------------
 bill  |  15 | I like to play soccer
 bob   |  20 | In my free time, I like to play BASEBALL
 jim   |  25 | I play video games everyday!!
 tony  |  30 | Im a really big fan of Hockey!!
 sandy |  35 | I could play soccer and hockey everyday


  sport   | cost 
----------+------
 soccer   |  100
 baseball |  150
 hockey   |  200

Ultimately, this query would return the following table, which does not include jim, as none of the words in his description were in the sport column in the cost table. Some times the sports might be one word, other times they might be multiple words. If the sports contain multiple words, I want all of those words to be present together in the description for it to be returned.

 name  | age |               description                
-------+-----+------------------------------------------
 bill  |  15 | I like to play soccer
 bob   |  20 | In my free time, I like to play BASEBALL
 tony  |  30 | Im a really big fan of Hockey!!
 sandy |  35 | I could play soccer and hockey everyday

I know that I could do this individually for each sport, but I'm hoping there is a better way to do this.

SELECT *
FROM person
WHERE lower(description) LIKE '%hockey%';

 name  | age |               description               
-------+-----+-----------------------------------------
 tony  |  30 | Im a really big fan of Hockey!!
 sandy |  35 | I could play soccer and hockey everyday

CODE TO CREATE THE TABLES BELOW


CREATE TABLE person (name VARCHAR(10), age INT, description VARCHAR(100));
INSERT INTO person (name, age, description) VALUES ("bill", 15, "I like to play soccer")
INSERT INTO person (name, age, description) VALUES ("bob", 20, "In my free time, I like to play BASEBALL")
INSERT INTO person (name, age, description) VALUES ("jim", 25, "I play video games everyday!!")
INSERT INTO person (name, age, description) VALUES ("tony", 30, "Im a really big fan of Hockey!!")
INSERT INTO person (name, age, description) VALUES ("sandy", 35, "I could play soccer and hockey everyday")

CREATE TABLE cost (sport VARCHAR(10), cost INT);
INSERT INTO cost (sport, cost) VALUES ('soccer', 100);
INSERT INTO cost (sport, cost) VALUES ('baseball', 150);
INSERT INTO cost (sport, cost) VALUES ('hockey', 200);

You can use joins:

SELECT DISTINCT p.name,p.age,p.description
FROM person p
  JOIN cost c ON p.description LIKE '%'||c.sport||'%'

DISTINCT is necessary to avoid getting two rows for Sandy.

Alternatively, you can use EXISTS and a subquery:

SELECT p.name,p.age,p.description
FROM person p
WHERE EXISTS (
  SELECT 1
  FROM cost c
  WHERE p.description LIKE '%'||c.sport||'%')

EXISTS checks whether the subquery returns at least one row, so it's irrelevant, what to select in the subquery. So why not 1?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM