简体   繁体   中英

Combine multiple rows based on one colum value and add extra columns based on other column value in Pandas

I have the following dataframe:

     Name   rollNumber   external_roll_number    testDate      marks 

0    John      34             234               2021-04-28      15 

1    John      34             234               2021-03-28      25

I would like to convert it like this:

     Name   rollNumber   external_roll_number    testMonth      marks    testMonth      marks

0    John      34             234                  April          15       March         25

If the above is not possible then I would atleast want it to be like this:

     Name   rollNumber   external_roll_number    testDate      marks    testDate      marks

0    John      34             234                2021-04-28      15     2021-03-28       25

How can I convert my dataframe to the desired output? This change will be based on the Name column of the rows.

EDIT 1

I tried using pivot_table like this but I did not get the desired result.

merged_df_pivot = pd.pivot_table(merged_df, index=["name", "testDate"], aggfunc="first", dropna=False).fillna("")

When I try to iterate through the merged_df_pivot like this:

for index, details in merged_df_pivot.iterrows():

I am again getting two rows and also I was not able to add the new testMonth column by the above method.

  • core is unstack() month to be columns
  • detail then to re-structure month-by month marks columns to required structure
  • generally consider bad practice to have duplicate column names, hence have suffixed them
df = pd.read_csv(io.StringIO("""     Name   rollNumber   external_roll_number    testDate      marks 
0    John      34             234               2021-04-28      15 
1    John      34             234               2021-03-28      25
"""), sep="\s+")

df["testDate"] =pd.to_datetime(df["testDate"])
df = df.assign(testMonth = df["testDate"].dt.strftime("%B")).drop(columns="testDate")


dft = (df.set_index([c for c in df.columns if c!="marks"])
 .unstack("testMonth") # make month a column
 .droplevel(0, axis=1) # remove unneeded level in columns
 # create columns for months from column names and rename marks columns
 .pipe(lambda d: d.assign(**{f"testMonth_{i+1}":c 
                             for i,c in enumerate(d.columns)}).rename(columns={c:f"marks_{i+1}" 
                                                                               for i,c in enumerate(d.columns)}))
 .reset_index()
)

output

Name rollNumber external_roll_number marks_1 marks_2 testMonth_1 testMonth_2
0 John 34 234 15 25 April March

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM