![](/img/trans.png)
[英]How to find all the data rows with column1 values more than twice the average column1 value across the dataset?
[英]Column value occurs more than X times
我正在嘗試僅 select 行,其中trends.insights_taxonomy
列值出現超過 X 次。 我一直在避免使用 COUNT(),因為我不做任何分組,我希望所有相關的行保持唯一。
我試圖剔除異常值,例如,如果我有一個包含 10 萬人最喜歡的 colors 的數據庫,我想忽略出現次數少於 50 次的 colors。
這是子查詢的用武之地嗎?
SELECT insights.industry,insights.city,insights.country,metrics.engagements,metrics.number_of_people_at_company, trends.insights_taxonomy,
FROM production.scores.api_company,
UNNEST(insights) AS insights,
UNNEST(metrics) AS metrics,
UNNEST(trends) AS trends
WHERE insights.industry <> ""
AND insights.city <> ""
AND insights.country <> ""
AND metrics.number_of_people_at_company > 0
AND metrics.engagements > 10
不確定格式化它的最佳方式,但第一行是列標簽,第二行是值。 在這種情況下,我只想要 Cisco Systems 出現次數超過 X 次的行。
industry | city | country | engagements | people_at_company | taxonomy
Legal Counsel and Prosecution | Madison | United States | 11 | 5 | Cisco Systems
如果您不想對結果數據進行分組,則需要在獲取結果數據之前確定符合條件的行。 編寫分組查詢以確定符合條件的行,然后您可以根據上面的查詢加入數據集以收集所有內容而不進行分組,或者執行 WHERE x IN(您的分組子查詢返回您想要查看完整數據的有效內容).
我想通了使用子查詢,希望這對其他人有幫助。
SELECT insights.industry,insights.city,insights.country,metrics.engagements,metrics.number_of_people_at_company, trends.insights_taxonomy, trends.total_interactions
FROM production.scores.api_company,
UNNEST(insights) AS insights,
UNNEST(metrics) AS metrics,
UNNEST(trends) AS trends
WHERE trends.insights_taxonomy IN
(SELECT trends.insights_taxonomy
FROM production.scores.api_company,
UNNEST(trends) AS trends
GROUP BY insights_taxonomy
HAVING count(*) > 100)
AND insights.city <> ""
AND insights.country <> ""
AND metrics.number_of_people_at_company > 0
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.