如何优化Mongodb的查询

Question

I have 300,000 documents in this specific collection. 我有300,000个文档在这个特定的集合中。 Each document is considered as one taxi trip. 每个文件被视为一次出租车旅行。 Each document contains a TaxiStation number and a License number. 每个文档都包含一个TaxiStation号和一个许可证号。

My goal is to figure out the number of trips per TaxiLicense per TaxiStation. 我的目标是找出每个TaxiStation的每个TaxiLicense的旅行次数。
For example: 例如：
TaxiStation A License X had 5 trips. TaxiStation A许可证X进行了5次旅行。
TaxiStation A License Y had 9 trips. TaxiStation A许可证Y有9次旅行。 And so on. 等等。

How can I optimize my query? 如何优化查询？ It is takes an upwards time of 30 minutes to complete! 最多需要30分钟才能完成！

List /*of*/ taxistationOfCollection, taxiLicenseOfTaxistation;
        //Here I get all the distinct TaxiStation numbers in the collection
        taxistationOfCollection = coll.distinct("TaxiStation");

        BasicDBObject query, tripquery;
        int tripcount;

        //Now I have to loop through each Taxi Station
        for(int i = 0; i<taxistationOfCollection.size(); i++)
        {
            query = new BasicDBObject("TaxiStation", taxistationOfCollection.get(i));
            //Here, I make a list of each distinct Taxi License in the current Taxi station
            taxiLicenseOfTaxistation = coll.distinct("TaxiLicense", query);

            //Now I make a loop to process each Taxi License within the current Taxi station
            for(int k = 0; k<taxiLicenseOfTaxistation.size();k++)
            {
                tripcount=0;
                if(taxiLicenseOfTaxistation.get(k) !=null)
                {
                    //I'm looking for each Taxi Station with this Taxi License
                    tripquery= new BasicDBObject("TaxiStation", taxistationOfCollection.get(i)).append("TaxiLicense", taxiLicenseOfTaxistation.get(k));
                    DBCursor cursor = coll.find(tripquery);

                    try {
                        while(cursor.hasNext()) {
                            //Increasing my counter everytime I find a match
                            tripcount++;
                            cursor.next();
                        } 
                    } finally {
                        //Finally printing the results
                        System.out.println("Station: " + taxistationOfCollection.get(i) + " License:" + taxiLicenseOfTaxistation.get(k)
                                + " Trips: " + tripcount);
                    }



                }
            }
        }

Sample Document : 样本文件：

{
  "_id" : ObjectId("53df46ed9b2ed78fb7ca4f23"),
  "Version" : "2",
  "Display" : [],
  "Generated" : "2014-08-04,16:40:05",
  "GetOff" : "2014-08-04,16:40:05",
  "GetOffCellInfo" : "46001,43027,11237298",
  "Undisplay" : [],
  "TaxiStation" : "0000",
  "GetOn" : "2014-08-04,16:40:03",
  "GetOnCellInfo" : "46001,43027,11237298",
  "TaxiLicense" : "000000",
  "TUID" : "26921876-3bd5-432e-a014-df0fb26c0e6c",
  "IMSI" : "460018571356892",
  "MCU" : "CM8001MA121225V1",
  "System_ID" : "000",
  "MeterGetOffTime" : "",
  "MeterGetOnTime" : "",
  "Setup" : [],
  "MeterSID" : "",
  "MeterWaitTime" : "",
  "OS" : "4.2",
  "PackageVersion" : "201407300888",
  "PublishVersion" : "201312060943",
  "SWVersion" : "rel_touchbox_20101010",
  "MeterMile" : 0,
  "MeterCharged" : 0,
  "GetOnLongitude" : 0,
  "GetOnLatitude" : 0,
  "GetOffLongitude" : 0,
  "TripLength" : 2,
  "GetOffLatitude" : 0,
  "Clicks" : 0,
  "updateTime" : "2014-08-04 16:40:10"
}

Answer 1

Aggregation is probably what you are looking for. 聚合可能是您想要的。 With an aggregation operation your whole code runs on the database and can be performed in a few lines. 通过聚合操作，您的整个代码可以在数据库上运行，并且可以在几行中执行。 Performance should also be a lot better since the database handles everything that needs to be done an can take full advantage of indexes and other stuff. 性能也应该好很多，因为数据库可以处理所有需要完成的工作，并且可以充分利用索引和其他内容。

From what you postet this boils down to a simple $group operation . 从您发布的内容可以归结为一个简单的$group操作。 In the shell this would look like: 在外壳中，它看起来像：

db.taxistationOfCollection.aggregate([
                         {$group: 
                             { _id:
                                    {station: "$TaxiStation", 
                                    licence: "$TaxiLicense"},
                              count : {$sum : 1}
                          }
                        ])

This will give you documents of the form 这将给您以下形式的文件

{_id : {station: stationid, licence: licence_number}, count: number_of_documents}

For Java it would look like this: 对于Java，它看起来像这样：

 DBObject taxigroup = new BasicDBObject("$group",
                               new BasicDBObject("_id", 
                                   new BasicDBObject("station","$TaxiStation")
                                   .append("Licence","$TaxiLicense"))
                               .append("count", new BasicDBObject("$sum",1)));
AggregationOutput aggout = taxistationOfCollection.aggregate(
                                                      Arrays.asList(taxigroup));

Please note that the code snippets are not tested. 请注意，代码段未经测试。

如何优化Mongodb的查询

问题描述

1 个解决方案

解决方案1
2 已采纳 2014-11-12 07:45:07

如何优化Mongodb的查询

问题描述

1 个解决方案

解决方案1 2 已采纳 2014-11-12 07:45:07

解决方案1
2 已采纳 2014-11-12 07:45:07