What is the fastest way to an inverse "multi-hot" (like one-hot with multiple simultaneous categories) operation on a large DataFrame?
I have the follow DataFrame:
id type_A type_B type_C
1 1 1 0
2 0 1 0
3 0 1 1
The operation would give:
id type
1 type_A
1 type_B
2 type_B
3 type_B
3 type_C
Using melt
and query
:
df = df.melt(id_vars='id', value_vars=['type_A', 'type_B', 'type_C']).query('value == 1')
id variable value
0 1 type_A 1
3 1 type_B 1
4 2 type_B 1
5 3 type_B 1
8 3 type_C 1
With correct column names:
df = (
df.melt(id_vars='id',
value_vars=['type_A', 'type_B', 'type_C'],
var_name='type')
.query('value == 1')
.drop(columns='value')
)
id type
0 1 type_A
3 1 type_B
4 2 type_B
5 3 type_B
8 3 type_C
melt should be the normal way to achieve this
yourdf=df.melt('id').loc[lambda x : x['value']==1]
id variable value
0 1 type_A 1
3 1 type_B 1
4 2 type_B 1
5 3 type_B 1
8 3 type_C 1
Here is a solution with .dot
which uses matrix multiplication with the columns helped by series.explode()
which is new in version 0.25+
:
m = df.set_index('id')
m.dot(m.columns+',').str.rstrip(',').str.split(',').explode().reset_index(name='type')
id type
0 1 type_A
1 1 type_B
2 2 type_B
3 3 type_B
4 3 type_C
Use:
new_df = (df.set_index('id')
.where(lambda x: x.eq(1))
.stack()
.rename_axis(['id','type'])
.reset_index()[['id','type']] )
print(new_df)
id type
0 1 type_A
1 1 type_B
2 2 type_B
3 3 type_B
4 3 type_C
df.melt(id_vars='id', ).query('value == 1').drop(columns='value').rename(columns={"variable":"type"})
desired result:
id type
0 1 type_A
3 1 type_B
4 2 type_B
5 3 type_B
8 3 type_C
You can replace all zeros with NaN
and stack
. By stacking all NaN
values are dropped. Than you can get the MultiIndex
and convert it into a data frame:
df = df.set_index('id') # set 'id' to index if necessary
df.replace(0, np.nan).stack().index.to_frame(index=False, name=['id', 'type'])
Output:
id type
0 1 type_A
1 1 type_B
2 2 type_B
3 3 type_B
4 3 type_C
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.