I have an Item
collection which could hold thousands to hundreds of thousands of documents. On that collection, I want to perform Geospatial queries. Using Mongoose, there are two options - find()
and the Aggregation Pipeline. I have displayed my implementations of both below:
To start, here are the relevant properties of my Mongoose Model:
// Define the schema
const itemSchema = new mongoose.Schema({
// Firebase UID (in addition to the Mongo ObjectID)
owner: {
type: String,
required: true,
ref: 'User'
},
// ... Some more fields
numberOfViews: {
type: Number,
required: true,
default: 0
},
numberOfLikes: {
type: Number,
required: true,
default: 0
},
location: {
type: {
type: 'String',
default: 'Point',
required: true
},
coordinates: {
type: [Number],
required: true,
},
}
}, {
timestamps: true
});
// 2dsphere index
itemSchema.index({ "location": "2dsphere" });
// Create the model
const Item = mongoose.model('Item', itemSchema);
// These variables are populated based on URL Query Parameters.
const match = {};
const sort = {};
// Query to make.
const query = {
location: {
$near: {
$maxDistance: parseInt(req.query.maxDistance),
$geometry: {
type: 'Point',
coordinates: [parseInt(req.query.lng), parseInt(req.query.lat)]
}
}
},
...match
};
// Pagination and Sorting
const options = {
limit: parseInt(req.query.limit),
skip: parseInt(req.query.skip),
sort
};
const items = await Item.find(query, undefined, options).lean().exec();
res.send(items);
Suppose distance needed to be calculated:
// These variables are populated based on URL Query Parameters.
const query = {};
const sort = {};
const geoSpatialQuery = {
$geoNear: {
near: {
type: 'Point',
coordinates: [parseInt(req.query.lng), parseInt(req.query.lat)]
},
distanceField: "distance",
maxDistance: parseInt(req.query.maxDistance),
query,
spherical: true
}
};
const items = await Item.aggregate([
geoSpatialQuery,
{ $limit: parseInt(req.query.limit) },
{ $skip: parseInt(req.query.skip) },
{ $sort: { distance: -1, ...sort } }
]).exec();
res.send(items);
Here is an example of a document with all of its properties from the Item
collection:
{
"_id":"5cd08927c19d1dd118d39a2b",
"imagePaths":{
"standard":{
"images":[
"users/zbYmcwsGhcU3LwROLWa4eC0RRgG3/5cd08927c19d1dd118d39a2b/images/Image-aafe69c7-f93e-411e-b75d-319042068921-standard.jpg",
"users/zbYmcwsGhcU3LwROLWa4eC0RRgG3/5cd08927c19d1dd118d39a2b/images/Image-397c95c6-fb10-4005-b511-692f991341fb-standard.jpg",
"users/zbYmcwsGhcU3LwROLWa4eC0RRgG3/5cd08927c19d1dd118d39a2b/images/Image-e54db72e-7613-433d-8d9b-8d2347440204-standard.jpg",
"users/zbYmcwsGhcU3LwROLWa4eC0RRgG3/5cd08927c19d1dd118d39a2b/images/Image-c767f54f-7d1e-4737-b0e7-c02ee5d8f1cf-standard.jpg"
],
"profile":"users/zbYmcwsGhcU3LwROLWa4eC0RRgG3/5cd08927c19d1dd118d39a2b/images/Image-51318c32-38dc-44ac-aac3-c8cc46698cfa-standard-profile.jpg"
},
"thumbnail":"users/zbYmcwsGhcU3LwROLWa4eC0RRgG3/5cd08927c19d1dd118d39a2b/images/Image-51318c32-38dc-44ac-aac3-c8cc46698cfa-thumbnail.jpg",
"medium":"users/zbYmcwsGhcU3LwROLWa4eC0RRgG3/5cd08927c19d1dd118d39a2b/images/Image-51318c32-38dc-44ac-aac3-c8cc46698cfa-medium.jpg"
},
"location":{
"type":"Point",
"coordinates":[
-110.8571443,
35.4586858
]
},
"numberOfViews":0,
"numberOfLikes":0,
"monetarySellingAmount":9000,
"exchangeCategories":[
"Math"
],
"itemCategories":[
"Sports"
],
"title":"My title",
"itemDescription":"A description",
"exchangeRadius":10,
"owner":"zbYmcwsGhcU3LwROLWa4eC0RRgG3",
"reports":[],
"createdAt":"2019-05-06T19:21:13.217Z",
"updatedAt":"2019-05-06T19:21:13.217Z",
"__v":0
}
Based on the above, I wanted to ask a few questions.
Is there a performance difference between my implementations of the normal Mongoose Query and the use of the Aggregation Pipeline?
Is it correct to say that near
and geoNear
are pretty much similar to nearSphere
when using the 2dsphere
index with GeoJSON - except that geoNear
provides extra data and default limiting? That is, although having different units, both queries - conceptually - would show relevant data within a specific radius from some location, despite the fact the field is called radius
for nearSphere
and maxDistance
with near
/ geoNear
.
With my example above, how might the performance loss of using skip
be mitigated but still be able to achieve pagination in both querying and aggregation?
The find()
function allows an optional parameter to determine which fields will be returned. The Aggregation Pipeline takes a $project
stage to do the same. Is there a specific order where $project
should be used in the pipeline to optimize speed/efficiency, or does it not matter?
I hope this style of question is permitted as per the Stack Overflow rules. Thank you.
I tried the below query with 2dsphere indexing.I used the aggregation pipeline
for the below query.
db.items.createIndex({location:"2dsphere"})
While using aggregation pipeline it gives you more flexibility on the result set. Also aggregation pipeline will improve the performance on running geo related searches.
db.items.aggregate([
{
$geoNear: {
near: { type: "Point", coordinates: [ -110.8571443 , 35.4586858 ] },
key: "location",
distanceField: "dist.calculated",
minDistance: 2,
query: { "itemDescription": "A description" }
}])
On your question on $skip below question will give you more insight on the $skip oepration $skip and $limit in aggregation framework
You can use $project accordingly to your need. In our case we didnt had much of performance issue using $project over 10 million of data
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.