简体   繁体   中英

Explode multiple lists of same length in DataFrame

I have a Pandas DataFrame with several lists in columns that I would like to split. Each list has the same length and they have to be split at the same indices.

What I have now uses a suggestion from here but I cannot make it work:

import numpy as np
import pandas as pd
from itertools import chain

split_size = 2

def split_list(arr, keep_partial=False):
    arrs = []
    while len(arr) >= split_size:
        sub = arr[:split_size]
        arrs.append(sub)
        arr   = arr[split_size:]
    if keep_partial:
        arrs.append(arr)
    return arrs


df = pd.DataFrame({'id': [1, 2, 3], 't': [[1,2,3,4], [1,2,3,4,5,6], [0,2]], 'v': [[0,-1,1,0], [0,-1,1,0,2,-2], [0,0]]})

def chainer(lst):
    return list(chain.from_iterable(split_list(lst, split_size)))

def chain_col(col):
    return col.apply(lambda x: chainer(x))

lens = df.t.apply(lambda x: len(split_list(x)))

pd.DataFrame({'id': np.repeat(df.id, lens), 't': chain_col(df.t), 'v': chain_col(df.v)})

The problem is that it repeats each full list rather than splits it across lines. I think the issue is the usage of chain.from_iterable but without it I simply get the list of lists (ie split lists) repeated rather than each split to its own row in the DataFrame.

My data set is not very large (a few thousand rows), so if there is a better way I'd be happy to learn. I looked at explode but that seems to split the data set based on a single column and I want multiple columns to be split in the same way.

My desired output is for id = 1 is

1. a row with t = [1,2] and v = [0,-1]
2. another row with t = [3,4] = [1,0]

Ideally I'd add a sub-index to each 'id' (eg 1 -> 1.1 and 1.2, so I can distinguish them) but that's a cosmetic thing, not my main problem.

IIUC, here is one way using a funstion which splits lists to n chunks , then applymap to split each cell , followed by explode and concat :

def split_lists(l, n):
    """splits a list to n chunks"""
    for i in range(0, len(l), n):  
        yield l[i:i + n]  

def explode_multiple(x):
    """This will use the prev func, 
       explode each columns and concat them to a dataframe"""
    m=x.applymap(lambda x: [*split_lists(x,2)])
    m=pd.concat([m.explode(i).loc[:,i] for i in m.columns],axis=1).reset_index()
    return m

explode_multiple(df.set_index('id')) #setting id as index since other columns have list

   id       t        v
0   1  [1, 2]  [0, -1]
1   1  [3, 4]   [1, 0]
2   2  [1, 2]  [0, -1]
3   2  [3, 4]   [1, 0]
4   2  [5, 6]  [2, -2]
5   3  [0, 2]   [0, 0]

Using explode , pd.concat and GroupBy :

note: this answer uses the new explode method only available from pandas>=0.25.0

d1 = df.explode('t').drop(columns='v')
d2 = df.explode('v').drop(columns=['id', 't'])

df2 = pd.concat([d1,d2], axis=1)
df2

s = df2.groupby('id')['id'].cumcount()//2

final = df2.groupby(['id', s]).agg({'t':list,
                                    'v':list}).reset_index(level=0)

final['id'] = final['id'].astype(str).str.cat('.'+final.groupby('id').cumcount().add(1).astype(str))

Output

    id       t        v
0  1.1  [1, 2]  [0, -1]
1  1.2  [3, 4]   [1, 0]
0  2.1  [1, 2]  [0, -1]
1  2.2  [3, 4]   [1, 0]
2  2.3  [5, 6]  [2, -2]
0  3.1  [0, 2]   [0, 0]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM