简体   繁体   中英

Sum values based on same keys in dict and make array

Hi guys I have data like this

[
{
    'name': 'snow 7', 
    'count': 1, 
    'rows_processed': None, 
    'pipelines': 1
}, 
{
    'name': 'snow 6',
    'count': 1,
    'rows_processed': None,
    'pipelines': 1
},
{
    'name': 'snow 6',
    'count': 1,
    'rows_processed': None,
    'pipelines': 1
}, 
{
    'name': 'snow 6',
    'count': 2,
    'rows_processed': None,
    'pipelines': 2
},
{
    'name': 'snow 5',
    'count': 2,
    'rows_processed': 4,
    'pipelines': 2
},
{
    'name': 'snow 4',
    'count': 2,
    'rows_processed': None,
    'pipelines': 2
}]

and i want to sum the values of rows_processed and pipelines based on name key like for snow 6 pipelines sum will be 4 and so on, basically the final data should look like this.

    {
     "Rows Processed": [0, 0, 4, 0],
     "Pipelines Processed": [1, 4, 2, 2]
    }

how can i make data like above? this is what i have done so for

    rows_processed = {}
    pipeline_processed = {}
    for batch in batches:
        for label in batch.keys():
            rows_processed[label] = rows_processed.get(batch['rows_processed'], 0) + batch['rows_processed'] if batch['rows_processed'] else 0
    for batch in batches:
        for label in batch.keys():
            pipeline_processed[label] = pipeline_processed.get(batch['pipelines'], 0) + batch['pipelines'] if \
            batch['pipelines'] else 0

One way using a two-level defaultdict and Boolean Operations :

>>> from collections import defaultdict
>>>
>>> d = defaultdict(lambda: defaultdict(int))
>>> for batch in batches:
...     d['Rows Processed'][batch['name']] += batch['rows_processed'] or 0
...     d['Pipelines Processed'][batch['name']] += batch['pipelines'] or 0
... 
>>> list(d['Rows Processed'].values())
[0, 0, 4, 0]
>>> list(d['Pipelines Processed'].values())
[1, 4, 2, 2]

Hey guys I resolved the above question by doing the following code however i'm not sure if this is the right approach or not. If anyone has better approach then please let me know.

    rows_processed = {}
    pipeline_processed = {}
    for batch in batches:
        rows_processed[batch['name']] = rows_processed.get(batch['name'], 0) + batch['rows_processed'] if batch['rows_processed'] else 0
    for batch in batches:
        pipeline_processed[batch['name']] = pipeline_processed.get(batch['name'], 0) + batch['pipelines'] if batch['pipelines'] else 0
print(list(rows_processed.values()))
print(list(pipeline_processed.values()))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM