简体   繁体   English

如何在MongoDB中实现唯一的_id值?

[英]How to achieve unique _id value in MongoDB?

I am using Python2.7, Pymongo and MongoDB. 我正在使用Python2.7,Pymongo和MongoDB。 I'm trying to get rid of the default _id values in MongoDB. 我正在尝试摆脱MongoDB中的默认_id值。 Instead, I want certain fields of columns to go as _id. 相反,我希望列的某些字段成为_id。

For example: 例如:

{
    "_id" : ObjectId("568f7df5ccf629de229cf27b"),
    "LIFNR" : "10099",
    "MANDT" : "100",
    "BUKRS" : "2646",
    "NODEL" : "",
    "LOEVM" : ""
}

I would like to concatenate LIFNR+MANDT+BUKRS as 100991002646 and hash it to achieve uniqueness and store it as new _id. 我想将LIFNR + MANDT + BUKRS连接为100991002646并对其进行哈希处理以实现唯一性并将其存储为新的_id。

But how far hashing helps for unique ids? 但是散列对唯一ID有多大帮助? And how do I achieve it? 我该如何实现呢?

I understood that using default hash function in Python gives different results for different machines (32 bit / 64 bit). 我知道在Python中使用默认哈希函数会为不同的机器(32位/ 64位)提供不同的结果。 If it is true, how would I go about generating _ids? 如果属实,我将如何生成_id?

But I need LIFNR+MANDT+BUKRS to be used however. 但是我需要使用LIFNR + MANDT + BUKRS。 Thanks in advance. 提前致谢。

First you can't update the _id field . 首先,您无法更新_id字段 Instead you should create a new field and set it value to the concatenated string. 相反,您应该创建一个新字段并将其值设置为串联字符串。 To return the concatenated value you need to use the .aggregate() method which provides access to the aggregation pipeline. 要返回级联值,您需要使用.aggregate()方法,该方法提供对聚合管道的访问。 The only stage in the pipeline is the $project stage where you use the $concat operator which concatenates strings and returns the concatenated string. 管道中唯一的阶段是$project阶段,在此阶段您使用$concat运算符来连接字符串并返回已连接的字符串。 From there you then iterate the cursor and update each document using "bulk" operations. 然后从那里迭代光标,并使用“批量”操作更新每个文档。

bulk = collection.initialize_ordered_bulk_op()
count = 0
cursor = collection.aggregate([
    {"$project": {"value": {"$concat": ["$LIFNR", "$MANDT", "$BUKRS"]}}}
])

for item in cursor:
    bulk.find({'_id': item['_id']}).update_one({'$set': {'id': item['value']}})
    count = count + 1
    if count % 200 == 0:
        bulk.execute()
if count > 0:
    bulk.execute()

MongoDB 3.2 deprecates Bulk() and its associated methods so you will need to use the bulk_write() method. MongoDB 3.2不推荐使用Bulk()及其关联方法,因此您将需要使用bulk_write()方法。

from pymongo import UpdateOne


requests = []
for item in cursor:
    requests.append(UpdateOne({'_id': item['_id']}, {'$set': {'id': item['value']}}))
collection.bulk_write(requests)

Your documents will then look like this: 您的文档将如下所示:

{'BUKRS': '2646',
  'LIFNR': '10099',
  'LOEVM': '',
  'MANDT': '100',
  'NODEL': '',
  '_id': ObjectId('568f7df5ccf629de229cf27b'),
  'id': '100991002646'}

您可以使用自己的哈希函数,那么它就不会依赖于体系结构,更重要的是,您将知道它对变量的作用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM