简体   繁体   中英

Unpack lists and tuples from DataFrame

The cells in my DataFrame have an odd format where the data is stored in lists and tuples. I'd like to unpack the values and split them to rows. At the moment I have the following DataFrame:

d={'Filename': {0: 'A', 1: 'B'},
 'RGB': {0: [([(0, 1650), (6, 39)], [(0, 1691), (1, 59)], [(50, 1402), (49, 187)])],
  1: [([(0, 1423), (16, 38)], [(0, 1445), (16, 46)], [(0, 1419), (16, 39)])]},
 'RGB_type': {0: ['r', 'g', 'b'], 1: ['r', 'g', 'b']}}
df=pd.DataFrame(d)

print(df)
    Filename    RGB                                                                             RGB_type
0   A           [([(0, 1650), (6, 39)], [(0, 1691), (1, 59)], [(50, 1402), (49, 187)])]         [r, g, b]
1   B           [([(0, 1423), (16, 38)], [(0, 1445), (16, 46)], [(0, 1419), (16, 39)])]         [r, g, b]

I would like to get it into this format:

     Filename    Top 1 colour    Top 1 frequency    Top 2 colour    Top 2 frequency  rgb
0    A           0               1650               6               39               r
0    A           0               1691               1               59               g
0    A           50              1402               49              187              b
1    B           0               1423               16              38               r
1    B           0               1445               16              46               g
1    B           0               1419               16              39               b

I have been able to access the first list with df_it.RGB.apply(pd.Series), but now I'm not sure how to proceed.

Here's one approach:

from itertools import chain
import numpy as np
# flatten the lists into an array and reshape into 4 columns
a = np.array(list(chain.from_iterable(df.RGB.values)))
out = pd.DataFrame(a.reshape(-1,4), 
                   columns=['Top 1 colour','Top 1 frequency',
                            'Top 2 colour','Top 2 frequency'])
# explode the remaining columns and assign back to the new dataframe
out.assign(**df.explode('RGB_type')[['Filemane', 'RGB_type']]
               .reset_index(drop=True))

        Top 1 colour  Top 1 frequency  Top 2 colour  Top 2 frequency Filemane  \
0             0             1650             6               39        A   
1             0             1691             1               59        A   
2            50             1402            49              187        A   
3             0             1423            16               38        B   
4             0             1445            16               46        B   
5             0             1419            16               39        B   

     RGB_type  
0        r  
1        g  
2        b  
3        r  
4        g  
5        b  

other approach of summating the column information and expanding as individual columns

df['RGB'] = df['RGB'].apply(lambda a: [list(sum(y,())) for y in a[0]])
df = df.reindex(df.index.repeat(df['RGB_type'].apply(len)))
df = df.groupby('Filename').apply(lambda x:x.apply(lambda y: pd.Series(y.iloc[0])))

Out:

    Filename    RGB RGB_type
0   A   [0, 1650, 6, 39]    r
1   NaN [0, 1691, 1, 59]    g
2   NaN [50, 1402, 49, 187] b
3   B   [0, 1423, 16, 38]   r
4   NaN [0, 1445, 16, 46]   g
5   NaN [0, 1419, 16, 39]   b



df.join(pd.DataFrame(df['RGB'].tolist(),columns=['Top 1 colour','Top 1 frequency',
                                         'Top 2 colour','Top 2 frequency'],index=te.index)).drop('RGB',1).ffill()

Out:

                Filename    RGB_type    Top 1 colour    Top 1 frequency Top 2 colour    Top 2 frequency
    Filename                            
 A  0       A   r   0   1650    6   39
    1       A   g   0   1691    1   59
    2       A   b   50  1402    49  187
 B  0       B   r   0   1423    16  38
    1       B   g   0   1445    16  46
    2       B   b   0   1419    16  39

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM