简体   繁体   中英

How do bson arrays compare (in mongodb/pymongo)?

I would like to store in mongdb some very large integers, exactly (several thousands decimal digits). This will not work of course with the standard types supported by BSON, and I am trying to think of the most elegant workaround, considering that I would like to perform range searches and similar things. This requirement excludes storing the integers as strings as it makes the range searches impractical.

One way I can think of is to encode the 2^32-expansion using (variable-length) arrays of standard ints, and add to this array a first entry for the length of the array itself. That way lexicographical ordering on these arrays corresponds to the usual ordering of arbitrarily large integers.

For instance, in a collection I could have the 5 documents

{"name": "me", "fortune": [1,1000]}
{"name": "scrooge mcduck", "fortune": [11,1,0,0,0,0,0,0,0,0,0,0]}
{"name": "bruce wayne","fortune": [2, 10,0]}
{"name": "bill gates", "fortune": [2,1,1000]}
{"name": "francis", "fortune": [0]}

Thus Bruce Wayne's net worth is 10*2^32, Bill Gates' 2^32+1000 and Scrooge McDuck's 2^320.

I can then do a sort using {"fortune":1} and on my machine (with pymongo) it returns them in the order francis < me < bill < bruce < scrooge, as expected.

However, I am making assumptions that I haven't seen documented anywhere about the way BSON arrays compare, and the range searches don't seem to work the way I think (for instance,

find({"fortune":{$gte:[2,5,0]}}) 

returns no document, but I would wish for bruce and scrooge).

Can anyone help me? Thanks

You can instead store left padded strings which represent exact integer equal to the fortune.

eg.  "1000000" = 1 million
     "0010000" = 10 thousand
     "2000000" = 2 million
     "0200000" = 2 hundred thousand 

Left padding with zeroes will ensure that lexographical comparison of these strings directly corresponds to their comparison as numeric values also. You will have to assume a safe MAXIMUM possible value of fortune here, say a 20 digit number, and pad the 0s accordingly So a sample documents would be :

  {"name": "scrooge mcduck", "fortune": "00001100000000000000" }
  {"name": "bruce wayne",    "fortune": "00000200000000000000" }

querying:

> db.test123.find()
{ "_id" : ObjectId("4f87e142f1573cffecd0f65e"), "name" : "bruce wayne", "fortune" : "00000200000000000000" }
{ "_id" : ObjectId("4f87e150f1573cffecd0f65f"), "name" : "donald", "fortune" : "00000150000000000000" }
{ "_id" : ObjectId("4f87e160f1573cffecd0f660"), "name" : "mickey", "fortune" : "00000000000000100000" }


> db.test123.find({ "fortune" : {$gte: "00000200000000000000"}});
{ "_id" : ObjectId("4f87e142f1573cffecd0f65e"), "name" : "bruce wayne", "fortune" : "00000200000000000000" }


> db.test123.find({ "fortune" : {$lt: "00000200000000000000"}});
{ "_id" : ObjectId("4f87e150f1573cffecd0f65f"), "name" : "donald", "fortune" : "00000150000000000000" }
{ "_id" : ObjectId("4f87e160f1573cffecd0f660"), "name" : "mickey", "fortune" : "00000000000000100000" }

The querying / sorting will work naturally as mongodb compares strings lexographically. However, to do other numeric operations on your data, you will have to write custom logic in your data processing script ( PHP,Python,Ruby etc)

For querying and data storage, this string version should do fine.

Unfortunately your assumption about array comparison is incorrect. Range queries that, for example, query for all array values smaller than 3 ({array:{$lt:3}}) will return all arrays where at least one element is less than three, regardless of the element's position. As such your approach will not work.

What does work, but is a bit less obvious, is using binary blobs for your very large integers since those are byte-order compared. That requires you set an upper bit limit for your integers but that should be fairly straightforward. You can test it in shell using BinData(subType, base64) notation :

db.col.find({fortune:{$gt:BinData(0, "e8MEnzZoFyMmD7WSHdNrFJyEk8M=")}})

So all you'd have to do is create methods to convert your big integers from, say, strings to two-complements binary and you're set. Good luck

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM