简体   繁体   中英

How to create a dummy dataframe from two columns?

lets say I have the dataframe:

a|stg1
a|stg2
a|stg3
b|stg2
b|stg3
c|stg1

and I would like to get a dataframe with dummies like this:

  stg1|stg2|stg3
a|  1 |  1 |  1
b|  0 |  1 |  1
c|  1 |  0 |  0

I have tried to use the get_dummies from pandas, but it doesn't do the trick I also tried to create a dictionary with two for loops, ad even though it works, it takes forevery, and there must be a more elegant and efficient solution for that.

Or maybe it's more of a pivot table kind of thing? But then what function should I use? each value pair is unique

You can use pd.crosstab which forms a frequency table by default:

# 0 is the column name of `a, b, c` and 1 is that of `stg*`
>>> res = pd.crosstab(df[0], df[1])
>>> res

1  stg1  stg2  stg3
0
a     1     1     1
b     0     1     1
c     1     0     0

1 and 0 on top left are the name of the columns in the original dataframe; they become the names of the index & columns of the result. If they are not needed:

>>> res = res.rename_axis(index=None, columns=None)
>>> res

   stg1  stg2  stg3
a     1     1     1
b     0     1     1
c     1     0     0

You can use a common pivot table ('A' and 'B' are your column names):

pv = pd.pivot_table(df, index='A', columns='B', aggfunc='size', fill_value=0)
pv.index.name=None
pv.columns.name=None

print(pv)

Output:

   stg1  stg2  stg3
a   1.0   1.0   1.0
b   0.0   1.0   1.0
c   1.0   0.0   0.0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM