Combine multiple rows based on one colum value and add extra columns based on other column value in Pandas

Question

I have the following dataframe:

     Name   rollNumber   external_roll_number    testDate      marks 

0    John      34             234               2021-04-28      15 

1    John      34             234               2021-03-28      25

I would like to convert it like this:

     Name   rollNumber   external_roll_number    testMonth      marks    testMonth      marks

0    John      34             234                  April          15       March         25

If the above is not possible then I would atleast want it to be like this:

     Name   rollNumber   external_roll_number    testDate      marks    testDate      marks

0    John      34             234                2021-04-28      15     2021-03-28       25

How can I convert my dataframe to the desired output? This change will be based on the Name column of the rows.

EDIT 1

I tried using pivot_table like this but I did not get the desired result.

merged_df_pivot = pd.pivot_table(merged_df, index=["name", "testDate"], aggfunc="first", dropna=False).fillna("")

When I try to iterate through the merged_df_pivot like this:

for index, details in merged_df_pivot.iterrows():

I am again getting two rows and also I was not able to add the new testMonth column by the above method.

Answer 1

core is unstack() month to be columns
detail then to re-structure month-by month marks columns to required structure
generally consider bad practice to have duplicate column names, hence have suffixed them

df = pd.read_csv(io.StringIO("""     Name   rollNumber   external_roll_number    testDate      marks 
0    John      34             234               2021-04-28      15 
1    John      34             234               2021-03-28      25
"""), sep="\s+")

df["testDate"] =pd.to_datetime(df["testDate"])
df = df.assign(testMonth = df["testDate"].dt.strftime("%B")).drop(columns="testDate")


dft = (df.set_index([c for c in df.columns if c!="marks"])
 .unstack("testMonth") # make month a column
 .droplevel(0, axis=1) # remove unneeded level in columns
 # create columns for months from column names and rename marks columns
 .pipe(lambda d: d.assign(**{f"testMonth_{i+1}":c 
                             for i,c in enumerate(d.columns)}).rename(columns={c:f"marks_{i+1}" 
                                                                               for i,c in enumerate(d.columns)}))
 .reset_index()
)

output

	Name	rollNumber	external_roll_number	marks_1	marks_2	testMonth_1	testMonth_2
0	John	34	234	15	25	April	March

Combine multiple rows based on one colum value and add extra columns based on other column value in Pandas

Question

1 answers

solution1
1 ACCPTED 2021-04-27 17:58:08

output

Combine multiple rows based on one colum value and add extra columns based on other column value in Pandas

Question

1 answers

solution1 1 ACCPTED 2021-04-27 17:58:08

output

solution1
1 ACCPTED 2021-04-27 17:58:08