简体   繁体   English

合并两个熊猫数据框,作为每个单元格中的列表

[英]Merge two pandas dataframes, as lists in every cell

I want to merge 2 dataframes, with the resulting dataframe having a list in every single cell.我想合并 2 个数据框,生成的数据框在每个单元格中都有一个列表。 I'm completely lost on how to do this.我完全不知道如何做到这一点。

My current solution is using the index of each dataframe to build a dict (eg. dict[index[0]]['DEPTH'] = [] ), and then looping over rows of the dataframes to append to dict keys (eg. dict[index[0]]['DEPTH'].append(cell_value) ), but I'm thinking that's super inefficient and slow.我当前的解决方案是使用每个数据帧的索引来构建一个字典(例如dict[index[0]]['DEPTH'] = [] ),然后循环遍历数据帧的行以附加到字典键(例如。 dict[index[0]]['DEPTH'].append(cell_value) ),但我认为这是超级低效和缓慢的。

Does a pandas solution exist that would get this done?是否存在可以完成此任务的 pandas 解决方案?

  • df1 would look like this: df1 看起来像这样:

在此处输入图像描述

  • df2 would look like this: df2 看起来像这样:

在此处输入图像描述

  • Resulting df would look something like this:结果 df 看起来像这样:
                    DEPTH        A
chr1~10007022~C    [1, 1]      [0, 0]
chr1~10007023~T    [1, 1]      [0, 0]
                  .
                  .
                  .
chr1~10076693~T    [1, 1]      [0, 0]

Keep in mind:记住:

  • indexes of dataframe would probably differ, but not always.数据帧的索引可能会有所不同,但并非总是如此。
  • dataframes will probably contain >100M rows each每个数据框可能包含 >100M 行

You could concatenate the two, groupby the item and then agg with list.您可以将两者连接起来,按项目分组,然后用列表聚合。

import pandas as pd

df = pd.DataFrame({'item':['chr1-10007022-C', 'chr1-10007023-T'],
                  'DEPTH':[1,1],
                  'A':[0,0],
                  'C':[0,0]})

df = df.set_index('item')

df2 = pd.DataFrame({'item':['chr1-10007022-C', 'chr1-10007026-X'],
                  'DEPTH':[1,1],
                  'A':[0,0],
                  'C':[0,0]})
df2 = df2.set_index('item')

out = pd.concat([df,df2]).groupby(level=0).agg(list)

Output输出

                  DEPTH       A       C
item                                   
chr1-10007022-C  [1, 1]  [0, 0]  [0, 0]
chr1-10007023-T     [1]     [0]     [0]
chr1-10007026-X     [1]     [0]     [0]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM