简体   繁体   中英

Merge two pandas dataframes, as lists in every cell

I want to merge 2 dataframes, with the resulting dataframe having a list in every single cell. I'm completely lost on how to do this.

My current solution is using the index of each dataframe to build a dict (eg. dict[index[0]]['DEPTH'] = [] ), and then looping over rows of the dataframes to append to dict keys (eg. dict[index[0]]['DEPTH'].append(cell_value) ), but I'm thinking that's super inefficient and slow.

Does a pandas solution exist that would get this done?

  • df1 would look like this:

在此处输入图像描述

  • df2 would look like this:

在此处输入图像描述

  • Resulting df would look something like this:
                    DEPTH        A
chr1~10007022~C    [1, 1]      [0, 0]
chr1~10007023~T    [1, 1]      [0, 0]
                  .
                  .
                  .
chr1~10076693~T    [1, 1]      [0, 0]

Keep in mind:

  • indexes of dataframe would probably differ, but not always.
  • dataframes will probably contain >100M rows each

You could concatenate the two, groupby the item and then agg with list.

import pandas as pd

df = pd.DataFrame({'item':['chr1-10007022-C', 'chr1-10007023-T'],
                  'DEPTH':[1,1],
                  'A':[0,0],
                  'C':[0,0]})

df = df.set_index('item')

df2 = pd.DataFrame({'item':['chr1-10007022-C', 'chr1-10007026-X'],
                  'DEPTH':[1,1],
                  'A':[0,0],
                  'C':[0,0]})
df2 = df2.set_index('item')

out = pd.concat([df,df2]).groupby(level=0).agg(list)

Output

                  DEPTH       A       C
item                                   
chr1-10007022-C  [1, 1]  [0, 0]  [0, 0]
chr1-10007023-T     [1]     [0]     [0]
chr1-10007026-X     [1]     [0]     [0]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM