![](/img/trans.png)
[英]Poor performance for traversing all the documents from mongodb using Java
[英]Poor MongoDB read performance
我有一個包含航班信息的分片集合。 該架構看起來像:
{
"_id" : ObjectId("537ef1bb5516dd401b5b109a"),
"departureAirport" : "HAJ",
"arrivalAirport" : "AYT",
"departureDate" : NumberLong("1412553600000"),
"operatingAirlineCode" : "DE",
"operatingFlightNumber" : "1808",
"flightClass" : "P",
"fareType" : "EX",
"availability" : "*"
}
這是我收藏的統計數據:
{
"sharded" : true,
"systemFlags" : 1,
"userFlags" : 1,
"ns" : "flights.flight",
"count" : 2809822,
"numExtents" : 30,
"size" : 674357280,
"storageSize" : 921788416,
"totalIndexSize" : 287746144,
"indexSizes" : {
"_id_" : 103499984,"departureAirport_1_arrivalAirport_1_departureDate_1_flightClass_1_availability_1_fareType_1" : 184246160
},
"avgObjSize" : 240,
"nindexes" : 2,
"nchunks" : 869,
"shards" : {
"shard0000" : {
"ns" : "flights.flight",
"count" : 1396165,
"size" : 335079600,
"avgObjSize" : 240,
"storageSize" : 460894208,
"numExtents" : 15,
"nindexes" : 2,
"lastExtentSize" : 124993536,
"paddingFactor" : 1,
"systemFlags" : 1,
"userFlags" : 1,
"totalIndexSize" : 144633440,
"indexSizes" : {
"_id_" : 53094944,"departureAirport_1_arrivalAirport_1_departureDate_1_flightClass_1_availability_1_fareType_1" : 91538496
},
"ok" : 1
},
"shard0001" : {
"ns" : "flights.flight",
"count" : 1413657,
"size" : 339277680,
"avgObjSize" : 240,
"storageSize" : 460894208,
"numExtents" : 15,
"nindexes" : 2,
"lastExtentSize" : 124993536,
"paddingFactor" : 1,
"systemFlags" : 1,
"userFlags" : 1,
"totalIndexSize" : 143112704,
"indexSizes" : {
"_id_" : 50405040,"departureAirport_1_arrivalAirport_1_departureDate_1_flightClass_1_availability_1_fareType_1" : 92707664
},
"ok" : 1
}
},
"ok" : 1
}
現在,我運行來自JAVA的查詢,如下所示:
{
"departureAirport" : "BSL",
"arrivalAirport" : "SMF",
"departureDate" : {
"$gte" : 1402617600000,
"$lte" : 1403136000000
},
"flightClass" : "C",
"$or" : [
{ "availability" : { "$gte" : "3"}},
{ "availability" : "*"}
] ,
"fareType" : "OW"
}
出發日期應在一周的范圍內,可用性應大於要求的數字或“ *”。
我的問題是我該怎么做才能提高表現。 當我查詢每個主機有50個連接的數據庫時,我只能得到大約1000 ops / s,但我需要獲得大約3000-5000 ops / s。
當我在外殼中運行查詢時,光標看起來可以:
“ cursor”:“ BtreeCursor出發機場_1_arrivalAirport_1_departureDate_1_flightClass_1_availability_1_fareType_1”
如果我忘記了什么,請寫信給我。 提前致謝。
使用BtreeCursor的事實不能使查詢確定。 輸出explain
將有助於找出問題。
我想一個關鍵問題是查詢參數的順序:
// equality, good
"departureAirport" : "BSL",
// equality, good
"arrivalAirport" : "SMF",
// range, bad because index based range queries should be near the end
// of contiguous index-based equality checks
"departureDate" : {
"$gte" : 1402617600000,
"$lte" : 1403136000000
},
// what is this, and how many possible values does it have? Seems to be
// a low selectivity index -> remove from index and move to end
"flightClass" : "C",
// costly $or, one op. is a range query, the other one equality...
// Simply set 'availability' to a magic number instead. That's
// ugly, but optimizations are ugly and it's unlikely we see planes with
// over e.g. 900,000 seats in the next couple of decades...
"$or" : [
{ "availability" : { "$gte" : "3"}},
{ "availability" : "*"}
] ,
// again, looks like low selectivity to me. Since it's already at the end,
// that's ok. I'd try to remove it from the index, however.
"fareType" : "OW"
您可能需要將索引更改為類似
"departureAirport_1_arrivalAirport_1_departureDate_1_availability_1"
並以完全相同的順序查詢。 將其他所有內容附加在后面,因此必須僅對與索引中所有其他條件都匹配的那些文檔進行掃描。
我假設flightClass
和fareType
選擇性較低。 如果那不是真的,那將不是最好的解決方案。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.