lets say I have the dataframe:
a|stg1
a|stg2
a|stg3
b|stg2
b|stg3
c|stg1
and I would like to get a dataframe with dummies like this:
stg1|stg2|stg3
a| 1 | 1 | 1
b| 0 | 1 | 1
c| 1 | 0 | 0
I have tried to use the get_dummies from pandas, but it doesn't do the trick I also tried to create a dictionary with two for loops, ad even though it works, it takes forevery, and there must be a more elegant and efficient solution for that.
Or maybe it's more of a pivot table kind of thing? But then what function should I use? each value pair is unique
You can use pd.crosstab
which forms a frequency table by default:
# 0 is the column name of `a, b, c` and 1 is that of `stg*`
>>> res = pd.crosstab(df[0], df[1])
>>> res
1 stg1 stg2 stg3
0
a 1 1 1
b 0 1 1
c 1 0 0
1
and 0
on top left are the name of the columns in the original dataframe; they become the names of the index & columns of the result. If they are not needed:
>>> res = res.rename_axis(index=None, columns=None)
>>> res
stg1 stg2 stg3
a 1 1 1
b 0 1 1
c 1 0 0
You can use a common pivot table ('A' and 'B' are your column names):
pv = pd.pivot_table(df, index='A', columns='B', aggfunc='size', fill_value=0)
pv.index.name=None
pv.columns.name=None
print(pv)
Output:
stg1 stg2 stg3
a 1.0 1.0 1.0
b 0.0 1.0 1.0
c 1.0 0.0 0.0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.