简体   繁体   中英

Python iterate through list of dictionaries

I have the below list of dictionaries -

results = [
     {'type': 'check_datatype',
      'kwargs': {'table': 'cars', 'columns': ['car_id','index'], 'd_type': 'str'},
      'datasource_path': '/cars_dataset_ok/',
      'Result': False},
    {'type': 'check_string_consistency',
      'kwargs': {'table': 'cars', 'columns': ['car_id'], 'string_length': 6},
      'datasource_path': '/cars_dataset_ok/',
      'Result': False}
    ]

I want output list with below output where key and value fields are coming from kwargs key in the above list -

id|key|value|index

[[1,table,cars,null],[1,columns,car_id,1],[1,columns,index,2] [1,dtype,str,null],[2,table,cars,null],[2,columns,car_id,null],[2,string_length,6,null]]

Update - Now, i want one more column in output - uniquehaschode --> here unique hashcode means Dictionaries with the same keys and values should generate the same id or hash. Hence if key value pairs are same in dictionary 'kwargs', then they should return the same hashcode. Output should be like this -

[[1,table,cars,null,uniquehaschode1],[1,columns,car_id,1,uniquehaschode1],[1,columns,index,2,uniquehaschode1] [1,dtype,str,null,uniquehaschode1],[2,table,cars,null,uniquehaschode2],[2,columns,car_id,null,uniquehaschode2],[2,string_length,6,null,uniquehaschode2]]

Also, i don't want to insert anything into this table if a particular uniquehaschode already exists.

Update2: I want to create a dataframe with below schema. args_id will be same for each unique pair of (kwargs and check_name). i want to run the above list of dictionaries everyday and hence for different date run, args_id should be same if unique pair of (kwargs and check_name) has come again. i want to store this result into a dataframe everyday and then put it into my delta table of spark.

Type|time|args_id
check_datatype|2021-03-29|0
check_string_consistency|2021-03-29|1
check_datatype|2021-03-30|0

Until now, i was using below code -

type_results = [[elt['type'] for
                   elt in results]
        checkColumns = ['type']
        spark = SparkSession.builder.getOrCreate()
        DF = spark.createDataFrame(data=results, schema=checkColumns)
        DF = DF.withColumn("time", F.current_timestamp())
       DF = DF.withColumn("args_id", F.row_number().over(Window.orderBy(F.monotonically_increasing_id())))
results = [
     {'type': 'check_datatype',
      'kwargs': {'table': 'cars', 'columns': ['car_id','index'], 'd_type': 'str'},
      'datasource_path': '/cars_dataset_ok/',
      'Result': False},
    {'type': 'check_string_consistency',
      'kwargs': {'table': 'cars', 'columns': ['car_id'], 'string_length': 6},
      'datasource_path': '/cars_dataset_ok/',
      'Result': False}
    ]

for each in results:
    print(each['kwargs'])

Probably you need:

results = [
     {'type': 'check_datatype',
      'kwargs': {'table': 'cars', 'columns': ['car_id','index'], 'd_type': 'str'},
      'datasource_path': '/cars_dataset_ok/',
      'Result': False},
    {'type': 'check_string_consistency',
      'kwargs': {'table': 'cars', 'columns': ['car_id'], 'string_length': 6},
      'datasource_path': '/cars_dataset_ok/',
      'Result': False}
    ]

result_list = []
for c, l in enumerate(results, start=1):
    for key, value in l['kwargs'].items():
        if isinstance(value,list):
            if len(value) == 1:
                result_list.append([str(c),key,value[0],'null'])
                continue
            for i in value:
                result_list.append([str(c),key,i,str(value.index(i)+1)])
        else:
            result_list.append([str(c),key,value,'null'])

print(result_list)

Output:

[['1', 'table', 'cars', 'null'], ['1', 'columns', 'car_id', '1'], ['1', 'columns', 'index', '2'], ['1', 'd_type', 'str', 'null'], ['2', 'table', 'cars', 'null'], ['2', 'columns', 'car_id', 'null'], ['2', 'string_length', 6, 'null']]

As for the Update part you can use pip install maps :

import maps
results = [
     {'type': 'check_datatype',
      'kwargs': {'table': 'cars', 'columns': ['car_id','index'], 'd_type': 'str'},
      'datasource_path': '/cars_dataset_ok/',
      'Result': False},
    {'type': 'check_string_consistency',
      'kwargs': {'table': 'cars', 'columns': ['car_id'], 'string_length': 6},
      'datasource_path': '/cars_dataset_ok/',
      'Result': False},
    {'type': 'check_string_consistency',
     'kwargs': {'table': 'cars', 'columns': ['car_id'], 'string_length': 6},
     'datasource_path': '/cars_dataset_ok/',
     'Result': False}
    ]
 
result_list = []
for c, l in enumerate(results, start=1):
    h = hash(maps.FrozenMap.recurse(l['kwargs']))
    for key, value in l['kwargs'].items():
        if isinstance(value,list):
            if len(value) == 1:
                result_list.append([str(c),key,value[0],'null', f'{h}-{c}'])
                continue
            for i in value:
                result_list.append([str(c),key,i,str(value.index(i)+1),f'{h}-{c}'])
        else:
            result_list.append([str(c),key,value,'null',f'{h}-{c}'])

print(result_list)

Output:

[['1', 'table', 'cars', 'null', '-6654319495930648246-1'], ['1', 'columns', 'car_id', '1', '-6654319495930648246-1'], ['1', 'columns', 'index', '2', '-6654319495930648246-1'], ['1', 'd_type', 'str', 'null', '-6654319495930648246-1'], ['2', 'table', 'cars', 'null', '-3876605863049152209-2'], ['2', 'columns', 'car_id', 'null', '-3876605863049152209-2'], ['2', 'string_length', 6, 'null', '-3876605863049152209-2'], ['3', 'table', 'cars', 'null', '-3876605863049152209-3'], ['3', 'columns', 'car_id', 'null', '-3876605863049152209-3'], ['3', 'string_length', 6, 'null', '-3876605863049152209-3']]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM