简体   繁体   English

使用 Python 进行批量删除操作的优化方式 Mongodb

[英]Optimized way to bulk delete operation Mongodb Agregation using Python

I want to delete data from multiple collection based on ids received using match query from aggregation.我想根据使用聚合中的匹配查询接收到的 id 从多个集合中删除数据。

Currently in python i am doing like this but it is taking a lot of time to execute.目前在 python 我正在这样做,但执行起来需要很多时间。 motor is used使用motor

data = studentSource.aggregate([
            {"$match": {'primary_source.utm_source': source_name}},
            {'$project': {'student_id': 1, '_id': 0}}
        ])

        students = [stud async for stud in data]
        if len(students) != 0:
            for i in range(len(students)):
                await studentsPrimaryDetails.delete_many({'_id': students[i]['student_id']})
                await studentSecondaryDetails.delete_many({'student_id': students[i]['student_id']})
                await studentTimeline.delete_many({'student_id': students[i]['student_id']})
                await studentApplicationForms.delete_many({'student_id': students[i]['student_id']})
                await queries.delete_many({'student_id': students[i]['student_id']})
                await leadsFollowUp.delete_many({'student_id': students[i]['student_id']})
                await lead_details.delete_many({'student_id': students[i]['student_id']})

            await studentSource.delete_many({'primary_source.utm_source': source_name})

    ```
Is there any improvement can be done to execute this much faster . Is bulkWrite() useful . But i have to delete data from multiple collection

I second Wernfried Domscheit.我第二个 Wernfried Domscheit。 Delete all matching documents with $in.删除所有与 $in 匹配的文档。 In Python would be something like this:在 Python 中会是这样的:

    data = studentSource.aggregate([
        {"$match": {'primary_source.utm_source': source_name}},
        {'$project': {'student_id': 1, '_id': 0}}
    ])

    students = await data.to_list(length=None)
    ids = [id for id in set([s.get('student_id') for s in students ]) if id ] 
    deletes = [
        studentsPrimaryDetails.delete_many({'_id': {'$in': ids }}),
        studentSecondaryDetails.delete_many({'student_id': {'$in': ids }}),
        studentTimeline.delete_many({'student_id': {'$in': ids }}),
        studentApplicationForms.delete_many({'student_id': {'$in': ids }}),
        queries.delete_many({'student_id': {'$in': ids }}),
        leadsFollowUp.delete_many({'student_id': {'$in': ids }}),
        lead_details.delete_many({'student_id': {'$in': ids }}),
        studentSource.delete_many({'primary_source.utm_source': source_name})
    ]
    await asyncio.gather(*deletes)

asyncio.gather fires all 8 requests in parallel, but depending on the driver's connection pool settings, some may wait for a free socket. asyncio.gather 并行触发所有 8 个请求,但根据驱动程序的连接池设置,有些可能会等待空闲套接字。

You can put the ID's into an array and use the array for condition.您可以将 ID 放入一个数组并使用该数组作为条件。 I have no clue about python, the JavaScript syntax would be this one:我不知道 python,JavaScript 语法是这样的:

student_ids = db.studentSource.aggregate([
    {"$match": {'primary_source.utm_source': source_name}},
    {'$project': {'student_id': 1, '_id': 0}}
]).toArray().map( x => x.student_id)

db.studentsPrimaryDetails.deleteMany({'_id': {$in: student_ids }})
db.studentSecondaryDetails.deleteMany({'_id': {$in: student_ids }})
db.studentTimeline.deleteMany({'_id': {$in: student_ids }})
db.studentApplicationForms.deleteMany({'_id': {$in: student_ids }})
db.queries.deleteMany({'_id': {$in: student_ids }})
db.leadsFollowUp.deleteMany({'_id': {$in: student_ids }})
db.lead_details.deleteMany({'_id': {$in: student_ids }})

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM