Unpack lists and tuples from DataFrame

Question

The cells in my DataFrame have an odd format where the data is stored in lists and tuples. I'd like to unpack the values and split them to rows. At the moment I have the following DataFrame:

d={'Filename': {0: 'A', 1: 'B'},
 'RGB': {0: [([(0, 1650), (6, 39)], [(0, 1691), (1, 59)], [(50, 1402), (49, 187)])],
  1: [([(0, 1423), (16, 38)], [(0, 1445), (16, 46)], [(0, 1419), (16, 39)])]},
 'RGB_type': {0: ['r', 'g', 'b'], 1: ['r', 'g', 'b']}}
df=pd.DataFrame(d)

print(df)
    Filename    RGB                                                                             RGB_type
0   A           [([(0, 1650), (6, 39)], [(0, 1691), (1, 59)], [(50, 1402), (49, 187)])]         [r, g, b]
1   B           [([(0, 1423), (16, 38)], [(0, 1445), (16, 46)], [(0, 1419), (16, 39)])]         [r, g, b]

I would like to get it into this format:

     Filename    Top 1 colour    Top 1 frequency    Top 2 colour    Top 2 frequency  rgb
0    A           0               1650               6               39               r
0    A           0               1691               1               59               g
0    A           50              1402               49              187              b
1    B           0               1423               16              38               r
1    B           0               1445               16              46               g
1    B           0               1419               16              39               b

I have been able to access the first list with df_it.RGB.apply(pd.Series), but now I'm not sure how to proceed.

Answer 1

Here's one approach:

from itertools import chain
import numpy as np
# flatten the lists into an array and reshape into 4 columns
a = np.array(list(chain.from_iterable(df.RGB.values)))
out = pd.DataFrame(a.reshape(-1,4), 
                   columns=['Top 1 colour','Top 1 frequency',
                            'Top 2 colour','Top 2 frequency'])
# explode the remaining columns and assign back to the new dataframe
out.assign(**df.explode('RGB_type')[['Filemane', 'RGB_type']]
               .reset_index(drop=True))

        Top 1 colour  Top 1 frequency  Top 2 colour  Top 2 frequency Filemane  \
0             0             1650             6               39        A   
1             0             1691             1               59        A   
2            50             1402            49              187        A   
3             0             1423            16               38        B   
4             0             1445            16               46        B   
5             0             1419            16               39        B   

     RGB_type  
0        r  
1        g  
2        b  
3        r  
4        g  
5        b

Answer 2

other approach of summating the column information and expanding as individual columns

df['RGB'] = df['RGB'].apply(lambda a: [list(sum(y,())) for y in a[0]])
df = df.reindex(df.index.repeat(df['RGB_type'].apply(len)))
df = df.groupby('Filename').apply(lambda x:x.apply(lambda y: pd.Series(y.iloc[0])))

Out:

    Filename    RGB RGB_type
0   A   [0, 1650, 6, 39]    r
1   NaN [0, 1691, 1, 59]    g
2   NaN [50, 1402, 49, 187] b
3   B   [0, 1423, 16, 38]   r
4   NaN [0, 1445, 16, 46]   g
5   NaN [0, 1419, 16, 39]   b



df.join(pd.DataFrame(df['RGB'].tolist(),columns=['Top 1 colour','Top 1 frequency',
                                         'Top 2 colour','Top 2 frequency'],index=te.index)).drop('RGB',1).ffill()

Out:

                Filename    RGB_type    Top 1 colour    Top 1 frequency Top 2 colour    Top 2 frequency
    Filename                            
 A  0       A   r   0   1650    6   39
    1       A   g   0   1691    1   59
    2       A   b   50  1402    49  187
 B  0       B   r   0   1423    16  38
    1       B   g   0   1445    16  46
    2       B   b   0   1419    16  39

Unpack lists and tuples from DataFrame

Question

2 answers

solution1
2 2020-03-07 16:59:56

solution2
1 ACCPTED 2020-03-07 17:21:09

Unpack lists and tuples from DataFrame

Question

2 answers

solution1 2 2020-03-07 16:59:56

solution2 1 ACCPTED 2020-03-07 17:21:09

solution1
2 2020-03-07 16:59:56

solution2
1 ACCPTED 2020-03-07 17:21:09