简体   繁体   English

从字典列表中创建 Pandas DataFrame? 每个字典在 DataFrame 中作为行?

[英]Creating a Pandas DataFrame from list of dictionaries? Each dictionary as row in DataFrame?

I have been through several posts, however, I am unable to sort out how to use each dictionary within a list of dictionaries to create a rows in a pandas Dataframe.我已经看过几篇文章,但是,我无法弄清楚如何使用字典列表中的每个字典在 pandas Dataframe 中创建行。 Specifically, I have two issues that my limited experience with dictionaries is unable to workaround.具体来说,我有两个问题,我对字典的有限经验无法解决。

  1. So far I have separated each key and value into two columns however, what I am looking for is to create a row for each dictionary and use the key as the column name.到目前为止,我已将每个键和值分成两列,但是,我正在寻找的是为每个字典创建一行并将键用作列名。
  2. Only the first key in each dictionary is unique, thus I would either like to drop it completely or only use the key as a value to populate a column under the name "id".只有每个字典中的第一个键是唯一的,因此我想完全删除它,或者只使用该键作为值来填充名为“id”的列。

Example List of Dictionaries (>500k in total):字典示例列表(总共> 500k):

pep_list=[{'HV404': 'WVLSQVQLQESGPGLVKPSGTLSLTCAVSGGSISSSNWWSWVR',
          'gene': 'HV404',
          'aa_comp': {'W': 4,
       'V': 5,
       'L': 5,
       'S': 10,
       'Q': 3,
       'E': 1,
       'G': 5,
       'P': 2,
       'K': 1,
       'T': 2,
       'C': 1,
       'A': 1,
       'I': 1,
       'N': 1,
       'R': 1},
      'peptide': ['WVLSQVQLQESGPGLVKPSGTLSLTCAVSGGSISSSNWWSWVR'],
      'Length': 43,
      'z': 3,
      'Mass': 4557,
      'm/z': 1519.0}, 
    {'A0A0G2JNQ3': 'ISGNTSR',
          'gene': 'A0A0G2JNQ3',
          'aa_comp': {'I': 1, 'S': 2, 'G': 1, 'N': 1, 'T': 1, 'R': 1},
          'peptide': ['ISGNTSR'],
          'Length': 7,
          'z': 2,
          'Mass': 715,
          'm/z': 357.5},etc.]

Expected output:预期 output:

Dataframe = pd.DataFrame({values from dictionaries}, columns=["id", "gene", 'aa_comp', 'peptide', 'length', 'z', 'mass','m/z')
id ID columns of keys键列
dictionary 1字典 1 values in seperate columns单独列中的值
dictionary 2字典 2 values in seperate columns单独列中的值

Thank you for any insight!感谢您的任何见解!

Whatever these things are不管这些东西是什么

{'HV404': 'WVLSQVQLQESGPGLVKPSGTLSLTCAVSGGSISSSNWWSWVR',}
{'A0A0G2JNQ3': 'ISGNTSR',}

are messing it up, plus it doesn't look like they are needed because the info is repeated.搞砸了,而且看起来不需要它们,因为信息是重复的。

If you want to take out a non-representative key you can do something like this如果你想取出一个非代表性的钥匙,你可以做这样的事情

key_intersect = set(pep_list[0].keys()).intersection(set(pep_list[1].keys()))
new_list_of_dictionaries = [{key:value for (key,value) in dicts.items() if key in key_intersect} for dicts in pep_list]
df = pd.DataFrame(new_list_of_dictionaries)

Pretty compact code, but you could unfurl it in loops if needed.非常紧凑的代码,但如果需要,您可以在循环中展开它。 Beware of blindly taking out the first element, unless it is an ordered dict the first element is not guaranteed to be the same.注意不要盲目地取出第一个元素,除非它是一个有序的字典,否则不保证第一个元素是相同的。

You can try this:你可以试试这个:

df = pd.DataFrame.from_dict(pep_list, orient='index').reset_index()

The orient changes the key to a column in the dataframe and reset_index is used to reset the index, although it may not be needed in your case. orient 将键更改为 dataframe 中的列,并且 reset_index 用于重置索引,尽管在您的情况下可能不需要它。

After that, you can filter out for the columns you want.之后,您可以过滤掉您想要的列。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 当每个字典的一个条目本身就是一个数组时,从词典列表中创建一个python pandas数据框 - creating a python pandas dataframe from a list of dictionaries when one entry of each dictionary is itself an array 通过字典列表,字典键列创建熊猫数据框 - Creating a pandas Dataframe from a list of Dictionaries, dictionary keys as columns 从字典词典列表创建Pandas数据框 - Creating a Pandas Dataframe from List of Dictionaries of Dictionaries 从词典列表创建Pandas Dataframe,进行解析 - Creating a Pandas Dataframe from a list of dictionaries, parsing 从字典列表创建 Pandas DataFrame - Creating a Pandas DataFrame from a list of dictionaries 从熊猫数据框创建字典列表 - creating a list of dictionaries from pandas dataframe 从没有循环的Pandas数据框中创建带有字典的字典 - Creating a dictionary with dictionaries from a Pandas dataframe without loop 从字典列表中创建一个 pandas DataFrame,其中字典键设置为行标签 - Create a pandas DataFrame from a list of dictionaries with dictionary keys set as row labels 任务:我正在尝试从字典列表中创建一个 pandas dataframe。 问题:这为每个字典项目创建了一个 dataframe - Task: I am trying to create a pandas dataframe from a list of dictionaries. Problem: This creates a dataframe for each dictionary item 从字典列表到Pandas DataFrame - From list of dictionaries to Pandas DataFrame
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM