简体   繁体   English

列值出现超过 X 次

[英]Column value occurs more than X times

I'm trying to only select rows where the trends.insights_taxonomy column value occurs over X times.我正在尝试仅 select 行,其中trends.insights_taxonomy列值出现超过 X 次。 I've been avoiding COUNT() as I do not what to do any grouping, I want all the correlating rows to remain unique.我一直在避免使用 COUNT(),因为我不做任何分组,我希望所有相关的行保持唯一。

I'm trying to weed out outliers, so for example, if I had a database of 100k peoples favorite colors, I want to ignore colors that occur less than 50 times.我试图剔除异常值,例如,如果我有一个包含 10 万人最喜欢的 colors 的数据库,我想忽略出现次数少于 50 次的 colors。

Is this where a subquery would come in?这是子查询的用武之地吗?

SELECT insights.industry,insights.city,insights.country,metrics.engagements,metrics.number_of_people_at_company, trends.insights_taxonomy,
FROM production.scores.api_company,
UNNEST(insights) AS insights,
UNNEST(metrics) AS metrics,
UNNEST(trends) AS trends
WHERE insights.industry <> ""
AND insights.city <> ""
AND insights.country <> ""
AND metrics.number_of_people_at_company > 0
AND metrics.engagements > 10

Not sure the best way to format this, but the top row is the column labels and the second row are the values.不确定格式化它的最佳方式,但第一行是列标签,第二行是值。 In this case I only want rows where Cisco Systems occurs more than X times.在这种情况下,我只想要 Cisco Systems 出现次数超过 X 次的行。

industry | city | country | engagements | people_at_company | taxonomy
Legal Counsel and Prosecution | Madison | United States | 11 | 5 | Cisco Systems

If you don't want to group your resulting data, then you need to determine your qualifying rows before you get your resulting data.如果您不想对结果数据进行分组,则需要在获取结果数据之前确定符合条件的行。 Write a grouping query to determine the qualifying rows, and then you can either JOIN the data set against your query above to gather everything without groupings, or perform a WHERE x IN (your grouping subquery returning valid things you want to see the complete data for).编写分组查询以确定符合条件的行,然后您可以根据上面的查询加入数据集以收集所有内容而不进行分组,或者执行 WHERE x IN(您的分组子查询返回您想要查看完整数据的有效内容).

I figured it out using a sub query, hopefully this is helpful for someone else.我想通了使用子查询,希望这对其他人有帮助。

SELECT insights.industry,insights.city,insights.country,metrics.engagements,metrics.number_of_people_at_company, trends.insights_taxonomy, trends.total_interactions
FROM production.scores.api_company,
UNNEST(insights) AS insights,
UNNEST(metrics) AS metrics,
UNNEST(trends) AS trends
WHERE trends.insights_taxonomy IN 
  (SELECT trends.insights_taxonomy
  FROM production.scores.api_company,
  UNNEST(trends) AS trends
  GROUP BY insights_taxonomy
  HAVING count(*) > 100)
AND insights.city <> ""
AND insights.country <> ""
AND metrics.number_of_people_at_company > 0

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在整个数据集中找到 column1 值超过 column1 平均值两倍的所有数据行? - How to find all the data rows with column1 values more than twice the average column1 value across the dataset? 在 Athena 中创建视图,但“多次指定列名” - Create View in Athena but "column name specified more than once" 我如何在快速应用程序中加速 firebase 功能,这些功能加载时间太长,大多数时间超过 8 秒 - How can i speed up firebase functions in an express app which are taking too long to load taking more than 8 seconds most times 如何循环查找特定键在对象数组中出现的次数 - How to loop for a specific key for the number of times it occurs in an array of objects 除非使用 SELECT AS STRUCT 构建 STRUCT 值,否则标量子查询不能有超过一列 - Scalar subquery cannot have more than one column unless using SELECT AS STRUCT to build STRUCT values 如何创建一个 SQL 查询,该查询返回超过一周的列过滤条目? - How do I create a SQL Query that returns a column filtering entries that are more than a week old? 无法使用 Redshift 中的更新语句将 NULL 值插入列 x - Cannot insert a NULL value into column x using update statement in Redshift BigQuery 根据某个字符串值在列中出现的次数将所有指标分成 n 个相等的部分 - BigQuery to split the all the metrics into n equal parts depending on the number of times a certain string value appears in a column 计算每个事件在 google data studio 中每 session 发生的平均次数 - Calculate the average times each event occurs per session in google data studio KQL:: 只返回超过 4 条记录的标签 - KQL :: return only tags with more than 4 records
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM