简体   繁体   中英

Importing Json to MongoDB with Python

I am currently trying to import a lot of json files to Mongodb, some of the jsons are simple with just object:Key:value and those json uploads I can query just fine within python. Example

[
    {
        "platform_id": 28,
        "mhz": 2400,
        "version": "1.1.1l" 
    }
[

The MongoDB compass shows it like this

Where the problem lies in one of the tools, creates a doc in Mongo, that I can not figure out how to query. The tool creates a json with system information, that's being pushed into the db. Example: ...

{
    "systeminfo": [
        {
            "component": "system board",
            "description": "sys board123"
        },
        {
            "component": "bios",
            "version": "xyz",
            "date": "06/28/2021"
        },
        {
            "component": "processors",
            "htt": true,
            "turbo": false
        },

... etc for a total of 23 objects.

If I push it directly into Mongo DB it looks like this in compass

So the question is, is there a way to collapse the hardware json one level or a way to query the db. I have found a way to collapse the json, but it moves each value pair into a new dictionary for upload and every parameter is done individually. Not sustainable as the tool is constantly adding new fields and need my app to handle the changes

Here is an example of the hw query, using same pattern works fine for the other collection

db=myclient[('db_name'])]
col = db[(HW_collection]
myquery={"component":"processors"}
mydoc=col.find(myquery)

The followup issue that almost always arises from {"systeminfo.component":"processors"} is that the whole doc will be returned for any array that contains at least one processors entry. Matching does not mean filtering. Below is a slightly more comprehensive solution that includes "collapsing" the info into the top level doc. Assume input is something like this:

{
    "doc":1, "systeminfo": [
    {"component": "system board","description": "sys board123"},
    {"component": "bios","version": "xyz","date": "06/28/2021"},
        {"component": "processors","htt": true,"turbo": false}
    ]
},{
    "doc":2, "systeminfo": [
    {"component": "RAM","description": "64G DIMM"},
        {"component": "processors","htt": false,"turbo": false},
    {"component": "bios","version": "abc","date": "06/28/2018"}
    ]
},{
    "doc":3, "systeminfo": [
    {"component": "RAM","description": "32G DIMM"},
    {"component": "SCSI","version": "X","date": "01/01/2000"}
    ]
}

then

db.foo.aggregate([
    {$project: {
        doc: true,  // carry doc num along for ride
        // Walk the $systeminfo array and filter for component = processors and
        // assign to field P (temporary field, any name is fine):

        P: {$filter: {input: "$systeminfo", as: "z",
                      cond: {$eq:["$$z.component","processors"]} }}
    }}

    // Remove docs that had no processors:
    ,{$match: {P: {$ne:[]}}}

    // A little complex but read it "backwards" to better understand.  The P
    // array will be left with 1 entry for processors.  "Lift" that doc out of
    // the array with $arrayElemAt[0] and merge it with the info in the containing
    // top level doc which is $$CURRENT, and then make that merged entity the
    // new root (essentially the new $$CURRENT)
    ,{$replaceRoot: {newRoot: {$mergeObjects: [ {$arrayElemAt:["$P",0]}, "$$CURRENT" ]}} }

    // Get rid of the tmp field:
    ,{$unset: "P"}
]);

yields

{
    "component" : "processors",
    "htt" : true,
    "turbo" : false,
    "_id" : ObjectId("61eab547ba7d8bb5090611ee"),
    "doc" : 1
}
{
    "component" : "processors",
    "htt" : false,
    "turbo" : false,
    "_id" : ObjectId("61eab547ba7d8bb5090611ef"),
    "doc" : 2
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM