简体   繁体   English

mongodb geoWithin 查询很慢

[英]mongodb geoWithin queries very slow

I'm running a geoWithin query with a polygon which is around 500km², and it's taking a very long time to execute, anywhere between 30s and 5 minutes.我正在运行一个带有大约 500 平方公里的多边形的 geoWithin 查询,执行时间很长,在 30 秒到 5 分钟之间。 The collection is only 180k rows, and the polygon could be anywhere from 2km² to 10,000km².该集合只有 180k 行,多边形可以是 2km² 到 10,000km² 的任何地方。 The server has around 4gb of RAM.服务器有大约 4GB 的 RAM。 Running locally (to eliminate network lag) has no noticeable effect.在本地运行(以消除网络延迟)没有明显效果。

I have setup a 2dsphere index on the collection, and limited the number of fields to only return _id (for now).我在集合上设置了 2dsphere 索引,并将字段数限制为仅返回 _id(目前)。

This is what my documents look like:这是我的文件的样子:

{
  "_id" : ObjectId("..."),
  "geometry" : {
    "type" : "MultiPolygon",
    "coordinates" : [[...]]
  },
  "area_sq_m" : 6699.1309787227955894
}

Here's my indexes:这是我的索引:

[
    {
        "v" : 1,
        "key" : {
            "_id" : 1
        },
        "name" : "_id_",
        "ns" : "db.output_areas"
    },
    {
        "v" : 1,
        "key" : {
            "geometry" : "2dsphere"
        },
        "name" : "geometry_2dsphere",
        "ns" : "db.output_areas",
        "2dsphereIndexVersion" : 2
    }
]

Here's my query:这是我的查询:

{
    "geometry": {
        $geoWithin: {
            $geometry: {
                type: 'Polygon',
                coordinates: [[ [lng,lat], [lng,lat], [lng,lat] ...]]
            }
        }
    }
}

And here's the output from running explain()这是运行explain()的输出

{
  "queryPlanner" : {
    "plannerVersion" : 1,
    "namespace" : "db.output_areas",
    "indexFilterSet" : false,
    "parsedQuery" : {
      "geometry" : {
        "$geoWithin" : {
          "$geometry" : {
            "type" : "Polygon",
            "coordinates" : [...]
          }
        }
      }
    },
    "winningPlan" : {
      "stage" : "PROJECTION",
      "transformBy" : {
        "_id" : 1
      },
      "inputStage" : {
        "stage" : "KEEP_MUTATIONS",
        "inputStage" : {
          "stage" : "FETCH",
          "filter" : {
            "geometry" : {
              "$geoWithin" : {
                "$geometry" : {
                  "type" : "Polygon",
                  "coordinates" : [...]
                }
              }
            }
          },
          "inputStage" : {
            "stage" : "IXSCAN",
            "keyPattern" : {
              "geometry" : "2dsphere"
            },
            "indexName" : "geometry_2dsphere",
            "isMultiKey" : true,
            "direction" : "forward",
            "indexBounds" : {
              "geometry" : [
                "[\"2f0332301\", \"2f0332301\"]",
                "[\"2f03323011\", \"2f03323011\"]",
                "[\"2f033230111\", \"2f033230112\")",
                "[\"2f033230112\", \"2f033230112\"]",
                "[\"2f0332301120\", \"2f0332301121\")",
                "[\"2f0332301121\", \"2f0332301121\"]",
                "[\"2f03323011210\", \"2f03323011211\")",
                "[\"2f03323011211\", \"2f03323011212\")",
                "[\"2f1003230\", \"2f1003230\"]",
                "[\"2f10032300\", \"2f10032300\"]",
                "[\"2f100323000\", \"2f100323001\")"
              ]
            }
          }
        }
      }
    },
    "rejectedPlans" : [ ]
  },
  "serverInfo" : {
    "version" : "3.0.4"
  },
  "ok" : 1
}

Which suggests an index is being used.这表明正在使用索引。 If i try with a smaller area, the query does get faster, and slower with a larger area.如果我尝试使用较小的区域,查询确实会变得更快,而更大的区域会变慢。

Here's my collection stats:这是我的收藏统计:

{
    "ns" : "db.output_areas",
    "count" : 181408,
    "size" : 3062445568,
    "avgObjSize" : 16881,
    "numExtents" : 22,
    "storageSize" : 3927183360,
    "lastExtentSize" : 1021497344,
    "paddingFactor" : 1,
    "paddingFactorNote" : "paddingFactor is unused and unmaintained in 3.0. It remains hard coded to 1.0 for compatibility only.",
    "userFlags" : 1,
    "capped" : false,
    "nindexes" : 2,
    "totalIndexSize" : 35606480,
    "indexSizes" : {
        "_id_" : 5894896,
        "geometry_2dsphere" : 29711584
    },
    "ok" : 1
}

I ran the db.setProfilingLevel(2) command, re-ran the query, then inspected the db.system.profile collection.我运行了db.setProfilingLevel(2)命令,重新运行了查询,然后检查了db.system.profile集合。

First record is the actual query ( "op": "query" )第一条记录是实际查询( "op": "query"

then 7 more queries with ( "op": "getmore" ) which i assume is fetching the rest of the data.然后使用 ( "op": "getmore" ) 进行 7 次查询,我认为它正在获取其余数据。

Each query yields 1000 rows ( "nreturned": 1000 ), and each query has an average of 4000 millis .每个查询产生 1000 行( "nreturned": 1000 ),每个查询的平均值为 4000 millis

I've read lots of questions where people are complaining about geojson queries taking > 2s with > 1m rows, so i'm obviously missing something simple.我读过很多问题,人们抱怨 geojson 查询需要 > 2s 和 > 1m 行,所以我显然错过了一些简单的东西。

Maybe, this is not a real answer, but may this solves the problem.也许,这不是一个真正的答案,但这可能会解决问题。 Remove the index and try again, the query will be slower in small polygons but faster with larger polygons compared with the execution time while having the index.删除索引并重试,与具有索引时的执行时间相比,在小多边形中查询会更慢,但在大多边形中查询会更快。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM