列值出现超过 X 次

Question

I'm trying to only select rows where the trends.insights_taxonomy column value occurs over X times.我正在尝试仅 select 行，其中trends.insights_taxonomy列值出现超过 X 次。 I've been avoiding COUNT() as I do not what to do any grouping, I want all the correlating rows to remain unique.我一直在避免使用 COUNT()，因为我不做任何分组，我希望所有相关的行保持唯一。

I'm trying to weed out outliers, so for example, if I had a database of 100k peoples favorite colors, I want to ignore colors that occur less than 50 times.我试图剔除异常值，例如，如果我有一个包含 10 万人最喜欢的 colors 的数据库，我想忽略出现次数少于 50 次的 colors。

Is this where a subquery would come in?这是子查询的用武之地吗？

SELECT insights.industry,insights.city,insights.country,metrics.engagements,metrics.number_of_people_at_company, trends.insights_taxonomy,
FROM production.scores.api_company,
UNNEST(insights) AS insights,
UNNEST(metrics) AS metrics,
UNNEST(trends) AS trends
WHERE insights.industry <> ""
AND insights.city <> ""
AND insights.country <> ""
AND metrics.number_of_people_at_company > 0
AND metrics.engagements > 10

Not sure the best way to format this, but the top row is the column labels and the second row are the values.不确定格式化它的最佳方式，但第一行是列标签，第二行是值。 In this case I only want rows where Cisco Systems occurs more than X times.在这种情况下，我只想要 Cisco Systems 出现次数超过 X 次的行。

industry | city | country | engagements | people_at_company | taxonomy
Legal Counsel and Prosecution | Madison | United States | 11 | 5 | Cisco Systems

Answer 1

If you don't want to group your resulting data, then you need to determine your qualifying rows before you get your resulting data.如果您不想对结果数据进行分组，则需要在获取结果数据之前确定符合条件的行。 Write a grouping query to determine the qualifying rows, and then you can either JOIN the data set against your query above to gather everything without groupings, or perform a WHERE x IN (your grouping subquery returning valid things you want to see the complete data for).编写分组查询以确定符合条件的行，然后您可以根据上面的查询加入数据集以收集所有内容而不进行分组，或者执行 WHERE x IN（您的分组子查询返回您想要查看完整数据的有效内容).

Answer 2

I figured it out using a sub query, hopefully this is helpful for someone else.我想通了使用子查询，希望这对其他人有帮助。

SELECT insights.industry,insights.city,insights.country,metrics.engagements,metrics.number_of_people_at_company, trends.insights_taxonomy, trends.total_interactions
FROM production.scores.api_company,
UNNEST(insights) AS insights,
UNNEST(metrics) AS metrics,
UNNEST(trends) AS trends
WHERE trends.insights_taxonomy IN 
  (SELECT trends.insights_taxonomy
  FROM production.scores.api_company,
  UNNEST(trends) AS trends
  GROUP BY insights_taxonomy
  HAVING count(*) > 100)
AND insights.city <> ""
AND insights.country <> ""
AND metrics.number_of_people_at_company > 0

列值出现超过 X 次

问题描述

2 个解决方案

解决方案1
1 2020-11-13 22:52:45

解决方案2
0 已采纳 2020-11-16 01:54:51

列值出现超过 X 次

问题描述

2 个解决方案

解决方案1 1 2020-11-13 22:52:45

解决方案2 0 已采纳 2020-11-16 01:54:51

解决方案1
1 2020-11-13 22:52:45

解决方案2
0 已采纳 2020-11-16 01:54:51