[英]Creating a Pandas DataFrame from list of dictionaries? Each dictionary as row in DataFrame?
I have been through several posts, however, I am unable to sort out how to use each dictionary within a list of dictionaries to create a rows in a pandas Dataframe.我已经看过几篇文章,但是,我无法弄清楚如何使用字典列表中的每个字典在 pandas Dataframe 中创建行。 Specifically, I have two issues that my limited experience with dictionaries is unable to workaround.
具体来说,我有两个问题,我对字典的有限经验无法解决。
Example List of Dictionaries (>500k in total):字典示例列表(总共> 500k):
pep_list=[{'HV404': 'WVLSQVQLQESGPGLVKPSGTLSLTCAVSGGSISSSNWWSWVR',
'gene': 'HV404',
'aa_comp': {'W': 4,
'V': 5,
'L': 5,
'S': 10,
'Q': 3,
'E': 1,
'G': 5,
'P': 2,
'K': 1,
'T': 2,
'C': 1,
'A': 1,
'I': 1,
'N': 1,
'R': 1},
'peptide': ['WVLSQVQLQESGPGLVKPSGTLSLTCAVSGGSISSSNWWSWVR'],
'Length': 43,
'z': 3,
'Mass': 4557,
'm/z': 1519.0},
{'A0A0G2JNQ3': 'ISGNTSR',
'gene': 'A0A0G2JNQ3',
'aa_comp': {'I': 1, 'S': 2, 'G': 1, 'N': 1, 'T': 1, 'R': 1},
'peptide': ['ISGNTSR'],
'Length': 7,
'z': 2,
'Mass': 715,
'm/z': 357.5},etc.]
Expected output:预期 output:
Dataframe = pd.DataFrame({values from dictionaries}, columns=["id", "gene", 'aa_comp', 'peptide', 'length', 'z', 'mass','m/z')
id ![]() |
columns of keys![]() |
---|---|
dictionary 1![]() |
values in seperate columns![]() |
dictionary 2![]() |
values in seperate columns![]() |
Thank you for any insight!感谢您的任何见解!
Whatever these things are不管这些东西是什么
{'HV404': 'WVLSQVQLQESGPGLVKPSGTLSLTCAVSGGSISSSNWWSWVR',}
{'A0A0G2JNQ3': 'ISGNTSR',}
are messing it up, plus it doesn't look like they are needed because the info is repeated.搞砸了,而且看起来不需要它们,因为信息是重复的。
If you want to take out a non-representative key you can do something like this如果你想取出一个非代表性的钥匙,你可以做这样的事情
key_intersect = set(pep_list[0].keys()).intersection(set(pep_list[1].keys()))
new_list_of_dictionaries = [{key:value for (key,value) in dicts.items() if key in key_intersect} for dicts in pep_list]
df = pd.DataFrame(new_list_of_dictionaries)
Pretty compact code, but you could unfurl it in loops if needed.非常紧凑的代码,但如果需要,您可以在循环中展开它。 Beware of blindly taking out the first element, unless it is an ordered dict the first element is not guaranteed to be the same.
注意不要盲目地取出第一个元素,除非它是一个有序的字典,否则不保证第一个元素是相同的。
You can try this:你可以试试这个:
df = pd.DataFrame.from_dict(pep_list, orient='index').reset_index()
The orient changes the key to a column in the dataframe and reset_index is used to reset the index, although it may not be needed in your case. orient 将键更改为 dataframe 中的列,并且 reset_index 用于重置索引,尽管在您的情况下可能不需要它。
After that, you can filter out for the columns you want.之后,您可以过滤掉您想要的列。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.