Suppose the next packed df, where column_name
represents the name of the columns and row_name
the name of the rows in an unpacked dataframe:
column_name row_name value
0 col1 a 1
1 col1 b 2
2 col1 c 3
3 col2 d 4
4 col2 e 5
5 col2 f 6
6 col2 g 7
7 col2 h 8
And the unpacked dataframe looks like this:
individual col1 col2
0 1 a e
1 2 b f
2 3 c g
3 4 a h
4 5 b e
5 6 c f
6 7 a g
7 8 b h
Desired output could look like this:
individual col1 col2 value_col1 value_col2
1 a e 1 5
2 b f 2 6
3 c g 3 7
4 a h 1 8
5 b e 2 5
6 c f 3 6
7 a g 1 7
8 b h 2 8
Is there any way to make a join in order to unpack value
column into two distinct columns in the unpacked dataframe?
Reproducible data:
unpacked_df = pd.DataFrame({
'individual': [1,2,3,4,5,6,7,8],
'col1':['a','b','c','a','b','c','a','b'],
'col2':['e','f','g','h','e','f','g','h']
})
packed_df = pd.DataFrame({
'column_name': ['col1','col1','col1','col2','col2','col2','col2','col2'],
'row_name': ['a','b','c','d','e','f','g','h'],
'value':[1,2,3,4,5,6,7,8]
})
You can do a double merge
, first for col1
, then for col2
:
unpacked_df.merge(packed_df[packed_df.column_name.eq("col1")], left_on="col1", right_on="row_name").merge(packed_df[packed_df.column_name.eq("col2")], left_on="col2", right_on="row_name", suffixes=("_col1","_col2")).sort_values("individual")[["individual", "col1", "col2", "value_col1", "value_col2"]]
Result:
individual col1 col2 value_col1 value_col2
0 1 a e 1 5
6 2 b f 2 6
5 3 c g 3 7
2 4 a h 1 8
1 5 b e 2 5
7 6 c f 3 6
4 7 a g 1 7
3 8 b h 2 8
This works:
cols = packed_df['column_name'].unique()
dct = {k: [] for k in cols}
for row in unpacked_df.iloc:
for col in cols:
dct[col].append(packed_df[(packed_df['column_name'] == col) & (packed_df['row_name'] == row[col])]['value'].iloc[0])
One option, is to melt the packed df, join it to the unpacked, do a pivot/flip and some column name adjustment:
temp = (unpacked_df
.melt('individual',
var_name ='column_name',
value_name='row_name')
.merge(packed_df,
on=['column_name', 'row_name'])
.pivot('individual', 'column_name')
.rename(columns={'row_name':''})
)
# collapse the columns
temp.columns = temp.columns.map('_'.join)
# removeprefix is a python3.9 addition
temp.rename(columns=lambda col: col.removeprefix('_'))
col1 col2 value_col1 value_col2
individual
1 a e 1 5
2 b f 2 6
3 c g 3 7
4 a h 1 8
5 b e 2 5
6 c f 3 6
7 a g 1 7
8 b h 2 8
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.