I am trying to remove the multi level but unable to do so.
import pandas as pd
k = pd.DataFrame([['x',2], ['y',4],['x',6]], columns=['name','value'])
agg_item={'value': [('n', 'count')]}
k=k[['name','value']].groupby(['name'],dropna=False).agg(agg_item).reset_index()
k
name value
n
0 x 2
1 y 1
k.columns
MultiIndex([( 'name', ''),
('value', 'n')],
)
How do I get sql like table with only 'name' and 'n' columns?
Desired output:
name n
0 x 2
1 y 1
You can use a named aggregation with pd.NamedAgg
to avoid creating a MultiIndex in the first place:
n_agg = pd.NamedAgg(column='value', aggfunc='count')
k = k[['name','value']].groupby(['name'],dropna=False).agg(n=n_agg).reset_index()
Output:
>>> k
name n
0 x 2
1 y 1
Or, as @itthrill suggested, you can use .agg(n=('value', 'count'))
instead of pd.NamedAgg
.
By using a list in your dictionary, you request to have a multindex.
You should use this syntax instead:
agg_item={'n': ('value', 'count')}
(k[['name','value']]
.groupby(['name'],dropna=False)
.agg(**agg_item).
reset_index()
)
NB. Don't forget to unpack the dictionary as parameters
Or without dictionary:
(k[['name','value']]
.groupby(['name'],dropna=False)
.agg(n=('value', 'count')).
reset_index()
)
Output:
name n
0 x 2
1 y 1
You can use a list comprehension to select levels:
k.columns = [col[0] if col[1]=='' else col[1] for col in k.columns]
you can also use or
instead of if-else:
k.columns = [col[1] or col[0] for col in k.columns]
Or you can droplevel
before reset_index
in your groupby
:
k=k[['name','value']].groupby(['name'],dropna=False).agg(agg_item).droplevel(0, axis=1).reset_index()
# ^ ^ ^ here
Output:
name n
0 x 2
1 y 1
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.