[英]How to merge lists of dictionaries and convert them into multiple dataframes in Python?
I have several lists of dictionaries and I want to convert them into dataframes. 我有几个字典列表,我想将它们转换成数据框。 I first used
update
to convert a list of dictionaries into a dictionary of dictionaries and then used pd.concat
to concatenate each dictionary. 我首先使用
update
将字典列表转换为字典字典,然后使用pd.concat
连接每个字典。
I grouped data by hospital ID and each list has two dictionaries. 我按医院ID对数据进行了分组,每个列表都有两个字典。 Within each dictionary, there's dataframe with columns 'hospital' , 'patientID' , and 'results'
在每个字典中,都有一个数据框 ,其中包含“医院” , “患者ID”和“结果”列
# Hospital35006 Adults Test results
diabetes_35006 =
[{'hospital': [35006, 35006], 'patientID': [0001, 0002], 'results': [0,1]}] #Adult Patients(18-25yrs)
[{'hospital': [35006, 35006], 'patientID': [0003, 0004], 'results': [1,0]}] #Adult Patients(25-30yrs)
# Hospital35007 Adults Test results
diabetes_35007 =
[{'hospital': [35007, 35007], 'patientID': [0001, 0002], 'results': [0,1]}] #Adult Patients(18-25yrs)
[{'hospital': [35007, 35007], 'patientID': [0003, 0004], 'results': [1,0]}] #Adult Patients(25-30yrs)
def resultDF(test_results):
adults_test_results = {}
for results in test_results:
adults_test_results.update(results) #Concatenate two adults test results in diabetes_35006 & diabetes_35007
dataframe = pd.concat(adults_test_results, ignore_index = True)
return dataframe
hospital_35006 = resultDF(diabetes_35006)
hospital_35007 = resultDF(diabetes_35007)
Since I also have test results from another 10 hospitals, is there something I can add to my code to generate dataframes more efficiently rather than writing hosiptal_35006 = resultDF(diabetes_35006)
... etc. each time? 由于我也有另外10家医院的测试结果,因此我是否可以在代码中添加一些内容,而不是每次都编写
hosiptal_35006 = resultDF(diabetes_35006)
等,从而更有效地生成数据hosiptal_35006 = resultDF(diabetes_35006)
?
I think best practice would be to have a dictionary of dictionaries ("diabetes") and then convert it to a dictionary of dataframes ("hospital") using dictionary comprehension. 我认为最佳实践是拥有字典(“糖尿病”)字典,然后使用字典理解将其转换为数据框字典(“医院”)。
This link can be useful to avoid the temptation of dynamically generating variables based on string: http://stupidpythonideas.blogspot.co.uk/2013/05/why-you-dont-want-to-dynamically-create.html 该链接对于避免基于字符串动态生成变量的诱惑很有用: http : //stupidpythonideas.blogspot.co.uk/2013/05/why-you-dont-want-to-dynamically-create.html
I assume (?) the correct input you have is (list of dicts): 我假设(?)您输入的是正确的(字典列表):
# Hospital35006 Adults Test results
diabetes_35006 =\
[{'hospital': [35006, 35006], 'patientID': [0001, 0002], 'results': [0,1]},\
#Adult Patients(18-25yrs)\
{'hospital': [35006, 35006], 'patientID': [0003, 0004], 'results': [1,0]}]
#Adult Patients(25-30yrs)
# Hospital35007 Adults Test results
diabetes_35007 = \
[{'hospital': [35007, 35007], 'patientID': [0001, 0002], 'results': [0,1]},\
#Adult Patients(18-25yrs)
{'hospital': [35007, 35007], 'patientID': [0003, 0004], 'results': [1,0]}]
#Adult Patients(25-30yrs)
First, your function to convert lists of dictionaries to DataFrame can be simplified: 首先,可以简化将字典列表转换为DataFrame的函数:
def resultDF(test_results):
return pd.concat([pd.DataFrame(res) for res in test_results]).reset_index(drop = True)
Then, what I am suggesting is to group all the results into a dictionary and convert all them to DF in one go: 然后,我建议将所有结果分组为字典,然后一次性将所有结果转换为DF:
test_dict = {35006 : diabetes_35006,
35007 : diabetes_35007}
res_dict = {key: resultDF(el) for key, el in test_dict.iteritems()}
So that you have: 这样您就可以:
res_dict[35006]
Out[64]:
hospital patientID results
0 35006 1 0
1 35006 2 1
2 35006 3 1
3 35006 4 0
and: 和:
hospital patientID results
0 35007 1 0
1 35007 2 1
2 35007 3 1
3 35007 4 0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.