简体   繁体   English

Python-如何在m​​ongo db中查找重复的名称/文档?

[英]Python-How to find duplicated name/document in mongo db?

I want to find the duplicated document in my mongodb based on name, I have the following code: 我想根据名称在mongodb中找到重复的文档,我有以下代码:

def Check_BFA_DB(options):
    issue_list=[]
    client = MongoClient(options.host, int(options.port))
        db = client[options.db]
        collection = db[options.collection]
        names = [{'$project': {'name':'$name'}}]
        name_cursor = collection.aggregate(names, cursor={})
        for name in name_cursor:
            issue_list.append(name)
            print(name)

It will print all names, how can I print only the duplicated ones? 它将打印所有名称,如何仅打印重复的名称?

Appritiated for any help! 寻求任何帮助!

The following query will show only duplicates: 以下查询将仅显示重复项:

db['collection_name'].aggregate([{'$group': {'_id':'$name', 'count': {'$sum': 1}}}, {'$match': {'count': {'$gt': 1}}}])

How it works: 这个怎么运作:

Step 1: Go over the whole collection, and group the documents by the property called name , and for each name count how many times it is used in the collection. 步骤1:遍历整个集合,并按名为name的属性对文档进行分组,并为每个名称计算在集合中使用该文档的次数。

Step 2: filter (using the keyword match ) only documents in which the count is greater than 1 (the gt operator). 步骤2:仅使用计数大于1的文档(使用gt运算符)过滤(使用关键字match )。

An example (written for mongo shell, but can be easily adapted for python): 一个示例(为mongo shell编写,但可以很容易地为python改编):

db.a.insert({name: "name1"})
db.a.insert({name: "name1"})
db.a.insert({name: "name2"})
db.a.aggregate([{"$group": {_id:"$name", count: {"$sum": 1}}}, {$match: {count: {"$gt": 1}}}])

Result is { "_id" : "name1", "count" : 2 } 结果是{ "_id" : "name1", "count" : 2 }

So your code should look something like this: 因此,您的代码应如下所示:

def Check_BFA_DB(options):
    issue_list=[]
    client = MongoClient(options.host, int(options.port))
    db = client[options.db]
    name_cursor = db[options.collection].aggregate([
        {'$group': {'_id': '$name', 'count': {'$sum': 1}}},
        {'$match': {'count': {'$gt': 1}}}
        ])

    for document in name_cursor:
        name = document['_id']
        issue_list.append(name)
        print(name)

BTW (not related to the question), python naming convention for function names is lowercase letters, so you might want to call it check_bfa_db() 顺便说一句(与问题无关),函数名称的python命名约定为小写字母,因此您可能需要将其check_bfa_db()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM