简体   繁体   中英

Column value occurs more than X times

I'm trying to only select rows where the trends.insights_taxonomy column value occurs over X times. I've been avoiding COUNT() as I do not what to do any grouping, I want all the correlating rows to remain unique.

I'm trying to weed out outliers, so for example, if I had a database of 100k peoples favorite colors, I want to ignore colors that occur less than 50 times.

Is this where a subquery would come in?

SELECT insights.industry,insights.city,insights.country,metrics.engagements,metrics.number_of_people_at_company, trends.insights_taxonomy,
FROM production.scores.api_company,
UNNEST(insights) AS insights,
UNNEST(metrics) AS metrics,
UNNEST(trends) AS trends
WHERE insights.industry <> ""
AND insights.city <> ""
AND insights.country <> ""
AND metrics.number_of_people_at_company > 0
AND metrics.engagements > 10

Not sure the best way to format this, but the top row is the column labels and the second row are the values. In this case I only want rows where Cisco Systems occurs more than X times.

industry | city | country | engagements | people_at_company | taxonomy
Legal Counsel and Prosecution | Madison | United States | 11 | 5 | Cisco Systems

If you don't want to group your resulting data, then you need to determine your qualifying rows before you get your resulting data. Write a grouping query to determine the qualifying rows, and then you can either JOIN the data set against your query above to gather everything without groupings, or perform a WHERE x IN (your grouping subquery returning valid things you want to see the complete data for).

I figured it out using a sub query, hopefully this is helpful for someone else.

SELECT insights.industry,insights.city,insights.country,metrics.engagements,metrics.number_of_people_at_company, trends.insights_taxonomy, trends.total_interactions
FROM production.scores.api_company,
UNNEST(insights) AS insights,
UNNEST(metrics) AS metrics,
UNNEST(trends) AS trends
WHERE trends.insights_taxonomy IN 
  (SELECT trends.insights_taxonomy
  FROM production.scores.api_company,
  UNNEST(trends) AS trends
  GROUP BY insights_taxonomy
  HAVING count(*) > 100)
AND insights.city <> ""
AND insights.country <> ""
AND metrics.number_of_people_at_company > 0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM