[英]divide/split a row into multiple rows in pandas data frame
我有以下數據集;
Subject Student ID Student Number
0 Cit11 [S95, S96, S97, S98, S99, S100, S101, S102, S1... 45
1 EngLang11 [S95, S96, S97, S98, S99, S100, S101, S102, S1... 45
2 EngLit11 [S110, S111, S112, S113, S114, S115, S116, S11... 21
3 Fre11 [S95, S96, S97, S99, S100, S101, S102, S103, S... 26
4 Ger11 [S114, S115, S116, S117, S118, S124, S125, S12... 13
5 His11 [S95, S96, S97, S98, S99, S100, S101, S102, S1... 45
6 Mat11 [S95, S96, S97, S98, S99, S100, S101, S102, S1... 45
7 Spa11 [S95, S97, S98, S99, S100, S102, S103, S104, S... 23
其中'Student Number'
是每個'Subject'
中'Student ID'
的總數。
假設'Student Number'
的最大值應為 30(classroom_Max_Capacity 返回值),以下代碼返回'Student Number'
超過最大人數的索引。
idx = filtered_Group[filtered_Group['Student Number'] > classroom_Max_Capacity].index.tolist()
Output: [0, 1, 5, 6]
我想知道是否可以通過更改'Subject'
名稱和'Student ID'
以適應最大學生人數將這些行分成兩行; 例如,
Subject Student ID Student Number
0 Cit11_1 [S95, S96, S97, S98, S99, S100, S101, S102, S1... 30
1 Cit11_2 [S110, S115, S116... 15
2 EngLang11_1 [S95, S96, S97, S98, S99, S100, S101, S102, S1... 30
3 EngLang11_2 [S110, S115, S116... 15
4 EngLit11 [S110, S111, S112, S113, S114, S115, S116, S11... 21
5 Fre11 [S95, S96, S97, S99, S100, S101, S102, S103, S... 26
6 Ger11 [S114, S115, S116, S117, S118, S124, S125, S12... 13
7 His11_1 [S95, S96, S97, S98, S99, S100, S101, S102, S1... 30
8 His11_2 [S110, S115, S116... 15
9 Mat11_1 [S95, S96, S97, S98, S99, S100, S101, S102, S1... 30
10 Matt11_2 [S110, S115, S116... 15
11 Spa11 [S95, S97, S98, S99, S100, S102, S103, S104, S... 23
通過不專門編寫修改后的'Subject'
名稱以添加到數據框中,這甚至可能嗎?
- 編輯
我試圖通過做類似的事情來解決這個問題;
filtered = filtered_Group.iloc[idx]
student_list = filtered['Student ID'].explode().str.split(', ')
subject_list = filtered['Subject']
for i in idx:
for number in range(classroom_Max_Capacity):
df.append({temp_subject_list[i]: temp_student_list[number]})
但是,當然,這不起作用,因此將不勝感激任何幫助。
您可以使用explode
和枚舉學生,然后groupby
:
# randome data
np.random.seed(1)
df = pd.DataFrame({
'Subject': list('abcdef'),
'Student Number': [np.random.choice(np.arange(20),
np.random.randint(3,10),
replace=None)
for _ in range(6)]
})
# maximum number of students allowed
max_students = 5
# output:
(df.explode('Student Number')
.assign(section=lambda x: x.groupby('Subject')
.cumcount()//max_students + 1
)
.groupby(['Subject','section'])
['Student Number'].agg([list, 'count'])
)
輸出:
list count
Subject section
a 1 [15, 10, 3, 18, 17] 5
2 [14, 16, 4] 3
b 1 [3, 2, 5, 8, 17] 5
2 [13, 10] 2
c 1 [11, 18, 2, 12, 16] 5
2 [17, 0, 4] 3
d 1 [16, 19, 11] 3
e 1 [16, 5, 4, 12, 15] 5
2 [19] 1
f 1 [18, 17, 3, 0, 1] 5
2 [9, 14, 13] 3
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.