如何在 MongoDB 的 bucketAuto 聚合函数中创建动态数量的（空）桶？

Question

I store metadata about files in a MongoDB database.我将有关文件的元数据存储在 MongoDB 数据库中。 One property is the filesize in bytes which I use for a histogram about file sizes.一个属性是以字节为单位的文件大小，我将其用于有关文件大小的直方图。 An example document looks like this:示例文档如下所示：

{
    "_id" : ObjectId("5c52366eeb3cae00c3896b89"),
    "doc_uuid" : "bfa2734a-a262-4b14-a03f-45108ae59fde",
    "files" : [
        {
            "uuid" : "7eca2b9d-61a6-4993-99d1-b23fa0a27197",
            "filesize" : 1391908,
            ...
        },
        {
            "uuid" : "c1277835-ce41-4057-a1ae-d67cc0aa7552",
            "filesize" : 4977756,
            ...
        },
    ]
}

I want to create buckets for filesizes of 2^n bytes.我想为 2^n 字节的文件大小创建存储桶。 For example:例如：

{"_id" : { "min": 0, "max": 1}, "count": 12},
{"_id" : { "min": 1, "max": 2}, "count": 1},
{"_id" : { "min": 2, "max": 4}, "count": 0},
{"_id" : { "min": 4, "max": 8}, "count": 145},

To archive this, I currently create an aggregation pipeline that looks like this:为了存档，我目前创建了一个聚合管道，如下所示：

db.repositories.aggregate([
  {"$match": {doc_uuid:{$in:["bfa2734a-a262-4b14-a03f-45108ae59fde"]}}},
  {'$unwind': '$files'},
  {'$bucketAuto':
    {'groupBy': '$files.filesize',
      buckets:16,
      granularity: "POWERSOF2"
    }
}])

which works fine.这工作正常。 This is an example of some real data I have:这是我拥有的一些真实数据的示例：

{ "_id" : { "min" : 8192, "max" : 16384 }, "count" : 16 }
{ "_id" : { "min" : 16384, "max" : 2097152 }, "count" : 1 }
{ "_id" : { "min" : 2097152, "max" : 8388608 }, "count" : 1 }

There are two questions I have about this:关于这个我有两个问题：

Because buckets is a required parameter (even if granularity="POWERSOF2" is set), I do not know which is the ideal value for buckets because I do not know the amount of buckets.因为buckets是必选参数（即使设置了granularity="POWERSOF2" ），我不知道buckets的理想值是哪个，因为我不知道bucket的数量。 Is it a good strategy to set the amount of buckets to a really high value (eg 1024 because it is unlikely, that I encounter a file with a filesize >= 2^1024 bytes) or is there a ways to distinguish the amount of buckets dynamically?将存储桶的数量设置为非常高的值是一个好策略（例如 1024，因为我遇到文件大小 >= 2^1024 字节的文件不太可能）还是有办法区分存储桶的数量动态？
If you look at my real data example you can see that there are only buckets with min/max/count values present where at least one document exists in a bucket.如果您查看我的真实数据示例，您会发现只有具有 min/max/count 值的存储桶，其中存储桶中至少存在一个文档。 Is it possible to create buckets with empty values as well so that for instance {"_id" : {"min": 4096, "max": 8192}, "count": 0} is in the result set as well?是否也可以创建具有空值的存储桶，例如{"_id" : {"min": 4096, "max": 8192}, "count": 0}也在结果集中？

And a side-question: How does MongoDB handle values which have a value of exactly 2^n, eg 1024?还有一个附带问题：MongoDB 如何处理恰好为 2^n 的值，例如 1024？ Do those values appear in two result sets (in this case in {"min": 512, "max": 1024} and in {"min": 1024, "max": 2048} )?这些值是否出现在两个结果集中（在本例中为{"min": 512, "max": 1024}和{"min": 1024, "max": 2048} ）？ If so, is it possible to create disjunct buckets?如果是这样，是否可以创建分离的存储桶？

Answer 1

Your first question seems to suggest that you don't actually want to use $bucketAuto but just $bucket .您的第一个问题似乎表明您实际上并不想使用$bucketAuto而只是$bucket 。 The whole point of bucketAuto is that it automatically determines the bucket boundaries, based on a desired count. bucketAuto 的全部意义在于它根据所需的计数自动确定桶边界。 In your case it seems that you have a sense of what you want your bucket boundaries to be, and would like to leave the number of buckets unspecified.在您的情况下，您似乎知道您希望存储桶的边界是什么，并且希望不指定存储桶的数量。

If you go with this option, then that answers your second question as well: with fixed bucket boundaries some buckets may end up being empty.如果您选择此选项，那么这也回答了您的第二个问题：在固定存储桶边界的情况下，某些存储桶可能最终为空。

如何在 MongoDB 的 bucketAuto 聚合函数中创建动态数量的（空）桶？

问题描述

1 个解决方案

解决方案1
0 2020-01-15 18:08:39

如何在 MongoDB 的 bucketAuto 聚合函数中创建动态数量的（空）桶？

问题描述

1 个解决方案

解决方案1 0 2020-01-15 18:08:39

解决方案1
0 2020-01-15 18:08:39