Quick silly question - I am sure this was asked before, but couldn't file detail. I have a dataframe df_students as below -
Student ID, Subjects , MArks_Received, Marks
222 English 3 90
222 Maths 3 80
222 Science 3 70
223 English 2 90
223 Maths 2 80
224 Maths 2 80
I am looking for below output based on Subjects and Received conditions, if no's of rows don't match for each student, will have to add extra Colum ( PENDING) or Received.
Student ID, Subjects , Expected_Rows, Marks, State
222 English 3 90 Received
222 Maths 3 80 Received
222 Science 3 70 Received
223 English 2 90 Received
223 Maths 2 80 Received
224 Maths 2 80 PENDING
As I have Expected_Rows 2 for "224" , but received only 1 , I should mark this as "Pending".
I am able to aggregate sum of marks as below, but cant figure out how to add State. Any help is highlight appreciated.
df_aggregate = df_students.groupby(['Student ', 'Marks'])['Marks'].agg(sum).reset_index()
There are many approaches, please see below if this helps:
Add a new column 'count'
and then 'State'
basis that:
df['Count'] = df.groupby('Student ID')['Student ID'].transform('count')
df['State'] = np.where(df['Count'] != df['MArks_Received'], 'PENDING','Received')
If you don't want to add a new column then use the following:
df['State'] = np.where(df.groupby('Student ID')['Student ID'].transform('count') != df['MArks_Received'], 'PENDING','Received')
It consider the rows where the count of 'Student ID'
doesn't match with 'Expected Rows'
.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.