I have a dataframe that looks like this:
ID type period
1 2 3
1 2 3
1 3 3
2 2 3
2 3 2
2 3 2
3 2 2
There are a total of X types and X periods. Not all types/periods will be used, but I need columns to be created for all X of each just so that the table doesn't break in the database when imported from pandas. (Assume X in this example is 3, but it's really 9, just shortened in this example.)
For each ID, I need a 0 to show if that type/period was present, and a 1 to show if it was not.
The desired dataframe looks like this:
ID type_1 type_2 type_3 period_1 period_2 period_3
1 0 1 1 0 0 1
2 0 1 1 0 1 1
3 0 1 0 0 1 0
Any advice towards the right direction would be greatly appreciated! Thank you!
From your DataFrame
:
>>> import pandas as pd
>>> from io import StringIO
>>> df = pd.read_csv(StringIO("""
ID type period
1 2 3
1 2 3
1 3 3
2 2 3
2 3 2
2 3 2
3 2 2"""), sep=' ')
>>> df
ID type period
0 1 2 3
1 1 2 3
2 1 3 3
3 2 2 3
4 2 3 2
5 2 3 2
6 3 2 2
We can use groupby
on columns 'ID' and 'type' to extract their size
, then unstack
the result, fill NaNs with zeros and finally convert it to bool
and int
as you want 0
and 1
values :
>>> df.groupby(['ID','type']).size().unstack(fill_value=0).astype(bool).astype(int)
type 2 3
ID
1 1 1
2 1 1
3 1 0
And for the period
column :
>>> df.groupby(['ID','period']).size().unstack(fill_value=0).astype(bool).astype(int)
period 2 3
ID
1 0 1
2 1 1
3 1 0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.