简体   繁体   English

在Vertica中查找集合的交集

[英]Finding intersection of sets in Vertica

I'm trying to sort out how to find the intersection of a pair of sets in Vertica (and a better way to do it). 我正在尝试找出如何在Vertica中找到一对集合的交集(以及一种更好的实现方式)。

I have 2 sets the first set is larger and stored in a single column table: 我有2套,第一套较大,并存储在单列表中:

San Francisco
New York
Chicago
London
Rome

The second set is stored as a delimited string in a Varchar field and can include items not in set 1 each set is a single string entry per row 第二组作为定界字符串存储在Varchar字段中,并且可以包括不在set 1中的项目,每组是每行一个字符串条目

San Francisco,Chicago,Tampa
Tampa,New Orleans,Miami

What I need to be able to do is efficiently choose the members of the second set that are in the first set and then get the intersection of the two sets for further processing so for the second set I need it to return: 我需要做的是有效地选择第一组中第二组的成员,然后获取两组的交集以进行进一步处理,因此对于第二组,我需要它返回:

{San Francisco,Chicago}
{}

So I need to be able to go through the table containing the set 2s and get that information then do something else for each instance of an intersection with the row that contains it. 因此,我需要能够浏览包含集合2s的表并获取该信息,然后对与包含该行的行相交的每个实例进行其他操作。

Suggestions, please! 请提出建议!

I get as far as generating a tabular output containing what you are looking for. 我可以生成包含您要查找的表格输出。 Exporting that in JSON format, as you seem to expect, is, in my eyes, the front end's job, not the database's. 在您看来,以JSON格式导出数据是前端的工作,而不是数据库的工作。

Having said that, see here: 话虽如此,请看这里:

-- input 1: one city per column
WITH city(city) AS (
          SELECT 'San Francisco'
UNION ALL SELECT 'New York'
UNION ALL SELECT 'Chicago'
UNION ALL SELECT 'London'
UNION ALL SELECT 'Rome'
)
,
-- input 2: many cities per column
cities(cities) AS (
          SELECT 'San Francisco,Chicago,Tampa'
UNION ALL SELECT 'Tampa,New Orleans,Miami'
)
,
-- end of input. Start "real" WITH clause here.
i(i) AS ( -- index for SPLIT_PART()
          SELECT  1 
UNION ALL SELECT  2 
UNION ALL SELECT  3 
UNION ALL SELECT  4 
UNION ALL SELECT  5 
UNION ALL SELECT  6 
UNION ALL SELECT  7 
UNION ALL SELECT  8 
UNION ALL SELECT  9 
UNION ALL SELECT 10
)
,
-- verticalise all those side-by-side cities, using SPLIT_PART() 
-- and the index table above
pivot_cities AS (
  SELECT DISTINCT
    SPLIT_PART(cities,',',i) AS city
  FROM cities CROSS JOIN i
)
-- INNER JOIN input 1 with the distinct verticalised cities of input 2
SELECT
  city.city
FROM city
JOIN pivot_cities USING(city)
;

-- result:
city
-------------
Chicago
San Francisco

Here is a way of doing it that doesn't require a manual pivot using UNION ALL and an assumed number of maximum items in the list. 这是一种不需要使用UNION ALL和列表中假定的最大项目数即可进行手动数据透视的方法。 In this example, the table t_city is the the one with single-entries in the column and t_cities is the one with multiple entries in the column: 在此示例中,表t_city是该列中具有单个条目的表,而t_cities是该列中具有多个条目的表:

WITH cte_cities AS (
    SELECT id, v_txtindex.StringTokenizerDelim(cities, ',') OVER (PARTITION BY id)
      FROM t_cities
)
    SELECT cte_cities.id AS cities_id,
           cte_cities.words AS city 
      FROM cte_cities
INNER JOIN t_city ON t_city.city = cte_cities.words
  GROUP BY cte_cities.id,
           cte_cities.words
  ORDER BY cte_cities.id

This will return rows with the id, which will allow you to aggregate them on the front-end of your application. 这将返回带有id的行,这将使您可以在应用程序的前端聚合它们。 If you require that they are re-assembled back into a comma delimited list then you will need to install the Vertica Strings Extension Package and use the function group_concat which should give you the results you are looking for. 如果需要将它们重新组合成逗号分隔的列表,则需要安装Vertica字符串扩展包并使用group_concat函数,该函数应为您提供所需的结果。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM