简体繁体 English

创建表的随机子集，每个键的平均计数数

[英]creating a random subset of a table with an average number of counts per keys

原文 2021-07-19 22:14:43 3 1 mysql/ sql

I have a database with 1 billion key val pairs with 20 million unique key s.我有一个包含 10 亿个key val对和 2000 万个唯一key的数据库。 On average, each key is associated with 50 val s.平均而言，每个key与 50 个val相关联。

key  val
key1 val1
key1 val2
key1 val3
key2 val2
key2 val7
.
.
.

I ran the following and got the standard deviation of the number of val s per each unique key .我运行了以下并得到了每个唯一key的val数量的标准偏差。

select avg(cnt), stddev(cnt)
  from (select count(key) as cnt, key
        from original_db)

This gives avg(cnt) = 50 and stddev(cnt)=137这给出了 avg(cnt) = 50 和 stddev(cnt)=137

I would like to create a subset of key s from this table such that the avg(cnt) of the subset is 100. This means that on average, each unique key in the subset table is associated with an average of ~ 100 values.我想从这个表中创建一个key的子集，这样子集的 avg(cnt) 是 100。这意味着平均而言，子集表中的每个唯一键都与平均约 100 个值相关联。

1 个解决方案

You can aggregate and use a cumulative average to calculate a running average:您可以汇总并使用累积平均值来计算运行平均值：

select key
from (select key, count(*) as cnt,
             avg(count(*)) over (order by cnt desc, key) as running_avg
      from t
     ) t
where running_avg >= 100;

In other words, this takes all the keys have have 100+ values and then keeps taking a smaller number while the cumulative average is 100 or over.换句话说，这需要所有键都具有 100+ 个值，然后在累积平均值为 100 或更多时继续取较小的数字。

Do note that this could return no keys, if no keys have 100 values.请注意，如果没有键具有 100 个值，则这可能不会返回任何键。

计算每月表中的平均小时数 - Calculate the average number of hours in table per month

如何打印MySQL表，计算每三个驱动程序的驱动器数量？ - How to print a MySQL table that counts the number of drives per top three drivers?

平均每天创建的行数？ - Average number of rows created per day?

查询以查找每组记录数的平均值 - Query to find the average of the number of records per group

如何总结国家数量以显示每个大陆的数量 - How to sum up country counts to display number per continent

如何编写一个计算每月和每年行数的SQL查询？ - How to write an SQL query that counts the number of rows per month and year?

MariaDB：Select 每个项目每月平均（数据透视表） - MariaDB: Select average per month per item (Pivot table)

SQL-表中数字匹配行的多个计数 - SQL - multiple counts for number matching rows in a table

统计MYSQL SPRING表的总行数 - Counts the total number of rows in the table MYSQL SPRING

MySQL查询生成每月的日期，项目和计数表 - MySQL query to generate table of dates, items, and counts per month

暂无

暂无

声明:本站的技术帖子网页，遵循CC BY-SA 4.0协议，如果您需要转载，请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 计算每月表中的平均小时数 - Calculate the average number of hours in table per month 如何打印MySQL表，计算每三个驱动程序的驱动器数量？ - How to print a MySQL table that counts the number of drives per top three drivers? 平均每天创建的行数？ - Average number of rows created per day? 查询以查找每组记录数的平均值 - Query to find the average of the number of records per group 如何总结国家数量以显示每个大陆的数量 - How to sum up country counts to display number per continent 如何编写一个计算每月和每年行数的SQL查询？ - How to write an SQL query that counts the number of rows per month and year? MariaDB：Select 每个项目每月平均（数据透视表） - MariaDB: Select average per month per item (Pivot table) SQL-表中数字匹配行的多个计数 - SQL - multiple counts for number matching rows in a table 统计MYSQL SPRING表的总行数 - Counts the total number of rows in the table MYSQL SPRING MySQL查询生成每月的日期，项目和计数表 - MySQL query to generate table of dates, items, and counts per month

相关标签

粤ICP备18138465号 © 2020-2024 STACKOOM.COM