简体   繁体   English

PostgreSQL根据数组值的组合选择行

[英]PostgreSQL Select rows based on combination of array values

I would like to select all rows from my database where one row contains at least two terms from a set of words/array. 我想从数据库中选择所有行,其中一行包含一组单词/数组中至少两个词。

As an example: I have the following array: 例如:我有以下数组:

'{"test", "god", "safe", "name", "hello", "pray", "stay", "word", "peopl", "rain", "lord", "make", "life", "hope", "whatever", "makes", "strong", "stop", "give", "television"}'    

and I got a tweet dataset stored in the database. 然后我将推文数据集存储在数据库中。 So i would like to know which tweets (column name: tweet.content) contain at least two of the words. 所以我想知道哪些推文(列名: tweet.content)包含 至少两个词。

My current code looks like this (but of course it only selects one word...): 我当前的代码看起来像这样(但当然它只会选择一个单词...):

CREATE OR REPLACE VIEW tweet_selection AS 
SELECT tweet.id, tweet.content, tweet.username, tweet.geometry,
FROM tweet
WHERE tweet.topic_indicator > 0.15::double precision
AND string_to_array(lower(tweet.content)) = ANY(SELECT '{"test", "god", "safe", "name", "hello", "pray", "stay", "word", "peopl", "rain", "lord", "make", "life", "hope", "whatever", "makes", "strong", "stop", "give", "television"}'::text[])

so the last line needs to be adjustested somehow, but i have no idea how - maybe with a inner join?! 所以最后一行需要以某种方式进行调整,但是我不知道如何-也许有内部联接?!

I have the words also stored with a unique id in a different table. 我的单词也以唯一的ID存储在另一个表中。

A friend of mine recommended getting a count for each row, but i have no writing access for adding an additional column in the original tables. 我的一个朋友建议获取每一行的计数,但是我没有写权限在原始表中添加额外的列。

Background: 背景:

I am storing my tweets in a postgres database and I applied a LDA (Latent dirichlet allocation) on the dataset. 我将推文存储在postgres数据库中,并对数据集应用了LDA(潜在狄利克雷分配)。 Now i got the generated topics and the words associated with each topic (20 topics and 25 words). 现在,我得到了生成的主题以及与每个主题相关的单词(20个主题和25个单词)。

select DISTINCT ON (tweet.id) tweet.id, tweet.content, tweet.username, tweet.geometry
from tweet
where
    tweet.topic_indicator > 0.15::double precision
    and (
        select count(distinct word)
        from
            unnest(
                array['test', 'god', 'safe', 'name', 'hello', 'pray', 'stay', 'word', 'peopl', 'rain', 'lord', 'make', 'life', 'hope', 'whatever', 'makes', 'strong', 'stop', 'give', 'television']::text[]
            ) s(word)
            inner join
            regexp_split_to_table(lower(tweet.content), ' ') v (word) using (word)
    ) >= 2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM