I have the following dataset;
Subject Student ID Student Number
0 Cit11 [S95, S96, S97, S98, S99, S100, S101, S102, S1... 45
1 EngLang11 [S95, S96, S97, S98, S99, S100, S101, S102, S1... 45
2 EngLit11 [S110, S111, S112, S113, S114, S115, S116, S11... 21
3 Fre11 [S95, S96, S97, S99, S100, S101, S102, S103, S... 26
4 Ger11 [S114, S115, S116, S117, S118, S124, S125, S12... 13
5 His11 [S95, S96, S97, S98, S99, S100, S101, S102, S1... 45
6 Mat11 [S95, S96, S97, S98, S99, S100, S101, S102, S1... 45
7 Spa11 [S95, S97, S98, S99, S100, S102, S103, S104, S... 23
where 'Student Number'
is the total number of 'Student ID'
in each 'Subject'
.
Let's say the maximum 'Student Number'
should be 30 (classroom_Max_Capacity returns the value), and the following code returns indexes where 'Student Number'
exceeds the maximum number.
idx = filtered_Group[filtered_Group['Student Number'] > classroom_Max_Capacity].index.tolist()
Output: [0, 1, 5, 6]
I am wondering if I can split those rows into two by changing 'Subject'
name and 'Student ID'
to fit the maximum student number; for example,
Subject Student ID Student Number
0 Cit11_1 [S95, S96, S97, S98, S99, S100, S101, S102, S1... 30
1 Cit11_2 [S110, S115, S116... 15
2 EngLang11_1 [S95, S96, S97, S98, S99, S100, S101, S102, S1... 30
3 EngLang11_2 [S110, S115, S116... 15
4 EngLit11 [S110, S111, S112, S113, S114, S115, S116, S11... 21
5 Fre11 [S95, S96, S97, S99, S100, S101, S102, S103, S... 26
6 Ger11 [S114, S115, S116, S117, S118, S124, S125, S12... 13
7 His11_1 [S95, S96, S97, S98, S99, S100, S101, S102, S1... 30
8 His11_2 [S110, S115, S116... 15
9 Mat11_1 [S95, S96, S97, S98, S99, S100, S101, S102, S1... 30
10 Matt11_2 [S110, S115, S116... 15
11 Spa11 [S95, S97, S98, S99, S100, S102, S103, S104, S... 23
Is this even possible by not specifically writing modified 'Subject'
name to add in the data frame?
--edit
I attempted to solve the problem by doing something like;
filtered = filtered_Group.iloc[idx]
student_list = filtered['Student ID'].explode().str.split(', ')
subject_list = filtered['Subject']
for i in idx:
for number in range(classroom_Max_Capacity):
df.append({temp_subject_list[i]: temp_student_list[number]})
But of course, this doesn't work so any help would be much appreciated.
You can use explode
and enumerate the students, and then groupby
:
# randome data
np.random.seed(1)
df = pd.DataFrame({
'Subject': list('abcdef'),
'Student Number': [np.random.choice(np.arange(20),
np.random.randint(3,10),
replace=None)
for _ in range(6)]
})
# maximum number of students allowed
max_students = 5
# output:
(df.explode('Student Number')
.assign(section=lambda x: x.groupby('Subject')
.cumcount()//max_students + 1
)
.groupby(['Subject','section'])
['Student Number'].agg([list, 'count'])
)
Output:
list count
Subject section
a 1 [15, 10, 3, 18, 17] 5
2 [14, 16, 4] 3
b 1 [3, 2, 5, 8, 17] 5
2 [13, 10] 2
c 1 [11, 18, 2, 12, 16] 5
2 [17, 0, 4] 3
d 1 [16, 19, 11] 3
e 1 [16, 5, 4, 12, 15] 5
2 [19] 1
f 1 [18, 17, 3, 0, 1] 5
2 [9, 14, 13] 3
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.