I have a Pandas dataframe which looks as follows: Starting Table:
Kode | Country | Procedure | male | male | female | female |
---|---|---|---|---|---|---|
Kode | Country | Procedure | two-year-old | three-year-old | two-year-old | three-year-old |
1a | US | proc_1 | 4 | 6 | 3 | 6 |
1a | UK | proc_1 | 2 | 3 | 5 | 1 |
1b | US | proc_2 | 15 | 3 | 5 | 2 |
1b | UK | proc_2 | 8 | 4 | 7 | 3 |
CSV:
Code;Country;Procedure;male;male;female;female
Code;Country;Procedure;two-year-old;three-year-old;two-year-old;three-year-old
1a;US;proc_1;4;6;3;6
1a;UK;proc_1;2;3;5;1
1b;US;proc_2;15;3;5;2
1b;UK;proc_2;8;4;7;3
My python code:
df = pd.read_csv('test.csv',
header=[0,1],delimiter= ';',
engine='python',
encoding= 'unicode_escape')
In fact I have two row headers How to pivot pandas dataframe to get the following result?
Resulting Table:
Code | Country | Procedure | Gender | Age | Frequency |
---|---|---|---|---|---|
1a | US | proc_1 | male | two-year-old | 4 |
1a | US | proc_1 | male | three-year-old | 6 |
1a | US | proc_1 | female | two-year-old | 3 |
1a | US | proc_1 | female | three-year-old | 6 |
1a | UK | proc_1 | male | two-year-old | 2 |
1a | UK | proc_1 | male | three-year-old | 3 |
1a | UK | proc_1 | female | two-year-old | 5 |
1a | UK | proc_1 | female | three-year-old | 1 |
1b | ... |
You'll likely want to go with melt
which is sort of the opposite of pivot:
import io
import pandas as pd
piv = pd.read_csv(io.StringIO("""Code;Country;Procedure;male;male;female;female
Code;Country;Procedure;two-year-old;three-year-old;two-year-old;three-year-old
1a;US;proc_1;4;6;3;6
1a;UK;proc_1;2;3;5;1
1b;US;proc_2;15;3;5;2
1b;UK;proc_2;8;4;7;3"""), header=[0, 1], delimiter=";")
unpivoted = piv.melt(id_vars=list(piv.columns[:3]),
var_name=['gender', 'age'],
value_name='frequency')
# cleanup the tupleized columns
unpivoted.columns = [(col[0] if isinstance(col, tuple) else col)
for col in unpivoted.columns]
Using "melt" with "pivot_table", you can try the following:
df_melt=df.melt(id_vars=['Kode','Country','Procedure'],var_name='Gender')
Extract two-year-old, three-year-old into its own column "Age_group", and remove non-numeric rows from the "value" column"
df_melt['Age_group']=df_melt['value'].str.extract('(.*-year-old)')
df_melt['Age_group']=df_melt['Age_group'].ffill()
df_melt=df_melt[pd.to_numeric(df_melt['value'],errors='coerce').notnull()]
If the original data frame is red into an editor as is, the columns named the same would be renamed as male, male.1, female, female.1, etc. To remove characters after "." from the "Gender" column
df_melt['Gender']=df_melt['Gender'].map(lambda x: x[0: x.find('.')] if '.' in x else x)
Pivot result
result=df_melt.pivot_table(index=['Kode','Country','Procedure','Gender','Age_group'],values='value')
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.