简体   繁体   中英

Nested complex query to JSON

I'm parsing complex JSON data with Python. The JSON data looks like the following:

{
    "data": [{
        "product_sn": "ABP-145",
        "process_data": [{
            "step_name": "step_a",
            "progress": {
                "total_steps": 10,
                "finished_steps": 10
            }
        },
        {
            "step_name": "step_b",
            "progress": {
                "total_steps": 9,
                "finished_steps": 6
            }
        },
        {
            "step_name": "step_c",
            "progress": {
                "total_steps": 15,
                "finished_steps": 15
            }
        }
        ]
    },
    {
        "product_sn": "ABP-146",
        "process_data": [{
            "step_name": "step_a",
            "progress": {
                "total_steps": 10,
                "finished_steps": 8
            }
        },
        {
            "step_name": "step_b",
            "progress": {
                "total_steps": 9,
                "finished_steps": 6
            }
        }]
    }]
}

The business scenario is: to produce the product, we have several steps: step_a, step_b, and step_c. To start step_c, the prerequisite is:

  1. There is a step_a node in the JSON data.
  2. "step_a" of that product is finished (total == finished)
  3. "step_c" of that product is not there (if it is there, it means step_c had been started already)

Now I want to get all the product_sn which are ready to start step_c.

Currently I'm using several nested 'for' loop to handle the "nested Dictionary and List" object created by json.loads(). The code is long and complex and hard to maintain. I'm wondering if there is a simple way like 'JSONPath' to do it with something like:

get(
    value=data.product_sn,
    criteria=(
        data.process_data(step_name=="step_a").
            progress(total_steps".value == "finished_steps".value) and 
        $not_exist data.process_data.step_name=="step_c"
    )
)

So I can get all the product_sn matching the searching condition.

I searched the examples and tried jsonpath_ng, jsonpath_rw, but the examples are very simple. Could anyone let me know how to implement the above query with some simple way? I really don't want to use the long, complex and ugly nested 'for' loop anymore.

You may also find below my code for handling this JSON (actually is has been simplified a lot to explain my question, the actual business is far more complex):

import json

json_str = '''{
    "data": [{
        "product_sn": "ABP-145",
        "process_data": [{
            "step_name": "step_a",
            "progress": {
                "total_steps": 10,
                "finished_steps": 10
            }
        },
        {
            "step_name": "step_b",
            "progress": {
                "total_steps": 9,
                "finished_steps": 6
            }
        },
        {
            "step_name": "step_c",
            "progress": {
                "total_steps": 15,
                "finished_steps": 15
            }
        }
        ]
    },
    {
        "product_sn": "ABP-146",
        "process_data": [{
            "step_name": "step_a",
            "progress": {
                "total_steps": 10,
                "finished_steps": 8
            }
        },
        {
            "step_name": "step_b",
            "progress": {
                "total_steps": 9,
                "finished_steps": 6
            }
        }]
    },
    {
        "product_sn": "ABP-147",
        "process_data": [{
            "step_name": "step_a",
            "progress": {
                "total_steps": 10,
                "finished_steps": 10
            }
        },
        {
            "step_name": "step_b",
            "progress": {
                "total_steps": 9,
                "finished_steps": 6
            }
        }]
    }]
}'''

json_obj = json.loads(json_str)
valid_products = list()
for product in json_obj.get('data'):
    product_sn = product['product_sn']
    process_data = product.get("process_data")
    if not process_data:
        continue
    valid_product = False
    for step in process_data:
        step_name = step['step_name']

        if step_name == 'step_c':
            valid_product = False
            break
        elif step_name == 'step_a':
            progress = step['progress']
            if progress['total_steps'] == progress['finished_steps']:
                valid_product = True
            else:
                valid_product = False
                break

    if valid_product:
        valid_products.append(product_sn)
    else:
        continue

print(valid_products)

Assuming your JSON object is stored in o variable:

prods = [p['product_sn'] for p in o['data'] if [a for a in p['process_data'] if a['step_name']=="step_a" and a['progress']['total_steps']==a['progress']['finished_steps']] and not [c for c in p['process_data'] if c['step_name']=="step_c"]]

Sorry for a long one-liner, I do not have PyCharm at my hand atm to break it into several lines so it looks good.

You can check working code here: link to repl.it

You could use a more functional approach to make that a bit cleaner.

from operator import itemgetter

json_obj = json.loads(json_str)

products = json_obj.get("data")

valid_products = filter(
    lambda p: "process_data" in p and
    p["process_data"]["step_name"] == "step_a" and
    p["process_data"]["step_name"]["progress"]["total_steps"] == p["process_data"]["step_name"]["progress"]["finished_steps"],    
    products
)
valid_product_sns = map(itemgetter("product_sn"), valid_products)

Of course, that filtering lambda is still pretty ugly.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM