I have a MongoDB collection, where each document is someone's demographic information (a unique identifier, name, address, etc).
As I parse new data into my database using Python/pymongo, I find new entries corresponding to existing identifiers, and I need to keep track of the new entries' counts in order to only utilize the most common one in the end.
For example, if I already have "Jenn Smith" in my collection, and then I get two new entries for "Jennifer Smith" and the same identifier, it is the same person and I just use Mongo's $inc
to increment a counter, so the document eventually looks like: 'names': { 'Jenn Smith': 1, 'Jennifer Smith': 2}
- And I can use "Jennifer Smith" which is the most common one in the end.
My problem arises when I have to deal with the exact same issue with the locations that Jenn Smith has associated with herself, because location
is a dictionary, for example: {'street': '123 Maple Street Apt A', 'city': 'Austin', 'state': 'TX'}
. Now it happens that sometimes I get several different locations, each one a dictionary, that so far I $push
into a Mongo locations
array. However, in the majority of cases there is a predominant location for each collection document, with any others being slight variations, eg: {'street': '123 Maple Street Apartment A', 'city': 'Austin', 'state': 'TX'}
.
I understand that $inc
can't work the same way as for names
, since Python dictionaries aren't hashable. How should I go about finding the most common element in my locations
array?
由于dictionary
未嵌套,因此可以为dictionary
创建一个frozon set
并对其进行hash
处理:
hash(frozenset(location.items()))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.