Pandas: Combine two data-frames with different shape based on one common column

Question

I have a df with columns:

Student_id      subject       marks
1               English       70
1               math          90
1               science       60
1               social        80
2               English       90
2               math          50
2               science       70
2               social        40

I have another df1 with columns

Student_id      Year_of_join   column_with_info
1                2020           some_info1
1                2020           some_info2
1                2020           some_info3
2                2019           some_info4
2                2019           some_info5

I want to combine two of the above data frames(.csv files) something like below res_df :

Student_id      subject       marks  year_of_join   column_with_info
1               English       70     2020            some_info1
1               math          90     2020            some_info2
1               science       60     2020            some_info3
1               social        80     NaN              NaN
2               English       90     2019            some_info4
2               math          50     2019            some_info5
2               science       70     NaN              NaN
2               social        40     NaN              NaN

Note: I want to join the datasets based on Student_id s. Both have the same unique Student_id's but the shape of the data is different for both the datasets.

PS: The resulting df res_df is just an example of how the data might look after combining two data-frames, It can also be like this:

Student_id      subject       marks  year_of_join   column_with_info
1               English       70     NaN               NaN
1               math          90     2020           some_info1
1               science       60     2020           some_info2
1               social        80     2020           some_info3
2               English       90     NaN               NaN
2               math          50     NaN               NaN
2               science       70     2019            some_info4
2               social        40     2019            some_info5

Thanks in advance for the help. Please help me to solve this..

Answer 1

Use GroupBy.cumcount for helper column used for merge with left join:

df['g'] = df.groupby('Student_id').cumcount()
df1['g'] = df1.groupby('Student_id').cumcount()

df = df.merge(df1, on=['Student_id','g'], how='left').drop('g', axis=1)

Pandas: Combine two data-frames with different shape based on one common column

Question

1 answers

solution1
1 ACCPTED 2021-03-08 12:29:15

Pandas: Combine two data-frames with different shape based on one common column

Question

1 answers

solution1 1 ACCPTED 2021-03-08 12:29:15

solution1
1 ACCPTED 2021-03-08 12:29:15