The cells in my DataFrame have an odd format where the data is stored in lists and tuples. I'd like to unpack the values and split them to rows. At the moment I have the following DataFrame:
d={'Filename': {0: 'A', 1: 'B'},
'RGB': {0: [([(0, 1650), (6, 39)], [(0, 1691), (1, 59)], [(50, 1402), (49, 187)])],
1: [([(0, 1423), (16, 38)], [(0, 1445), (16, 46)], [(0, 1419), (16, 39)])]},
'RGB_type': {0: ['r', 'g', 'b'], 1: ['r', 'g', 'b']}}
df=pd.DataFrame(d)
print(df)
Filename RGB RGB_type
0 A [([(0, 1650), (6, 39)], [(0, 1691), (1, 59)], [(50, 1402), (49, 187)])] [r, g, b]
1 B [([(0, 1423), (16, 38)], [(0, 1445), (16, 46)], [(0, 1419), (16, 39)])] [r, g, b]
I would like to get it into this format:
Filename Top 1 colour Top 1 frequency Top 2 colour Top 2 frequency rgb
0 A 0 1650 6 39 r
0 A 0 1691 1 59 g
0 A 50 1402 49 187 b
1 B 0 1423 16 38 r
1 B 0 1445 16 46 g
1 B 0 1419 16 39 b
I have been able to access the first list with df_it.RGB.apply(pd.Series), but now I'm not sure how to proceed.
Here's one approach:
from itertools import chain
import numpy as np
# flatten the lists into an array and reshape into 4 columns
a = np.array(list(chain.from_iterable(df.RGB.values)))
out = pd.DataFrame(a.reshape(-1,4),
columns=['Top 1 colour','Top 1 frequency',
'Top 2 colour','Top 2 frequency'])
# explode the remaining columns and assign back to the new dataframe
out.assign(**df.explode('RGB_type')[['Filemane', 'RGB_type']]
.reset_index(drop=True))
Top 1 colour Top 1 frequency Top 2 colour Top 2 frequency Filemane \
0 0 1650 6 39 A
1 0 1691 1 59 A
2 50 1402 49 187 A
3 0 1423 16 38 B
4 0 1445 16 46 B
5 0 1419 16 39 B
RGB_type
0 r
1 g
2 b
3 r
4 g
5 b
other approach of summating the column information and expanding as individual columns
df['RGB'] = df['RGB'].apply(lambda a: [list(sum(y,())) for y in a[0]])
df = df.reindex(df.index.repeat(df['RGB_type'].apply(len)))
df = df.groupby('Filename').apply(lambda x:x.apply(lambda y: pd.Series(y.iloc[0])))
Out:
Filename RGB RGB_type
0 A [0, 1650, 6, 39] r
1 NaN [0, 1691, 1, 59] g
2 NaN [50, 1402, 49, 187] b
3 B [0, 1423, 16, 38] r
4 NaN [0, 1445, 16, 46] g
5 NaN [0, 1419, 16, 39] b
df.join(pd.DataFrame(df['RGB'].tolist(),columns=['Top 1 colour','Top 1 frequency',
'Top 2 colour','Top 2 frequency'],index=te.index)).drop('RGB',1).ffill()
Out:
Filename RGB_type Top 1 colour Top 1 frequency Top 2 colour Top 2 frequency
Filename
A 0 A r 0 1650 6 39
1 A g 0 1691 1 59
2 A b 50 1402 49 187
B 0 B r 0 1423 16 38
1 B g 0 1445 16 46
2 B b 0 1419 16 39
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.