[英]How to Arrange Dataframe values Based on the Occurrence
Suppose I have a Dataframe having the following column values:假设我有一个 Dataframe 具有以下列值:
Name | Subject | Mark
------------------------------------
Daniel | Maths | 95
Sam | Science | 98
Nathan | English | 90
Hobbs | Social | 85
Shaw | Language | 90
Daniel | Social | 99
Shaw | Science | 75
Nathan | Maths | 99
Sam | Language | 70
Hobbs | Language | 90
Shaw | Social | 90
Nathan | Social | 85
Daniel | English | 90
Nathan | Science | 85
Hobbs | English | 85
Nathan | Language | 90
Daniel | Science | 98
Sam | Social | 85
Shaw | Maths | 95
Daniel | Language | 95
Sam | Maths | 99
Hobbs | Science | 99
Sam | English | 75
Shaw | English | 90
Hobbs | Maths | 85
I need to create a code which would transform the dataframe into the following:我需要创建一个代码,它将 dataframe 转换为以下内容:
Name | Subject 1 | Mark 1 | Subject 2 | Mark 2 | Subject 3 | Mark 3 | Subject 4 | Mark 4 | Subject 5 | Mark 5
----------------------------------------------------------------------------------------------------------------------------------------------------
Daniel | Maths | 95 | Social | 99 | English | 90 | Science | 98 | Language | 95
Sam | Science | 98 | Language | 70 | Social | 85 | Maths | 99 | English | 75
Nathan | English | 90 | Maths | 99 | Social | 85 | Science | 85 | Language | 90
Hobbs | Social | 85 | Language | 90 | English | 85 | Science | 99 | Maths | 85
Shaw | Language | 90 | Science | 75 | Social | 90 | Maths | 95 | English | 90
What is done here is that for each name, the subjects are arranged in the order of their occurrence in the original dataframe in columns.这里所做的是,对于每个名称,主题按照它们在原始 dataframe 中出现的顺序排列在列中。 What python code can achieve this?什么 python 代码可以实现这个?
Edit#1: There are other columns like Reg.No, Class, etc. which is just unique value for each student.编辑#1:还有其他列,例如 Reg.No、Class 等,这对每个学生来说都是唯一的价值。 So I need a solution which has the above mentioned columns along with these.所以我需要一个解决方案,其中包含上述列以及这些列。
Try pivot
:尝试pivot
:
In [1415]: df.pivot('Name', 'Subject', 'Mark')
Out[1415]:
Subject English Language Maths Science Social
Name
Daniel 90 95 95 98 99
Hobbs 85 90 85 99 85
Nathan 90 90 99 85 85
Sam 75 70 99 98 85
Shaw 90 90 95 75 90
A little convoluted, but should do:有点复杂,但应该这样做:
>>> (df.groupby('Name', sort=False)[['Subject', 'Mark']]
.apply(lambda x: x.stack().reset_index(drop=True))
.rename(columns=lambda x: f"{['Subject', 'Mark'][x%2]} {x//2+1}")
)
Subject 1 Mark 1 Subject 2 Mark 2 Subject 3 Mark 3 Subject 4 \
Name
Daniel Maths 95 Social 99 English 90 Science
Sam Science 98 Language 70 Social 85 Maths
Nathan English 90 Maths 99 Social 85 Science
Hobbs Social 85 Language 90 English 85 Science
Shaw Language 90 Science 75 Social 90 Maths
Mark 4 Subject 5 Mark 5
Name
Daniel 98 Language 95
Sam 99 English 75
Nathan 85 Language 90
Hobbs 99 Maths 85
Shaw 95 English 90
EDIT:编辑:
If you want n
subjects:如果你想要n
主题:
>>> n = 3
>>> (df.groupby('Name', sort=False)[['Subject', 'Mark']]
.apply(lambda x: x.iloc[:n].stack().reset_index(drop=True))
.rename(columns=lambda x: f"{['Subject', 'Mark'][x%2]} {x//2+1}")
)
Subject 1 Mark 1 Subject 2 Mark 2 Subject 3 Mark 3
Name
Daniel Maths 95 Social 99 English 90
Sam Science 98 Language 70 Social 85
Nathan English 90 Maths 99 Social 85
Hobbs Social 85 Language 90 English 85
Shaw Language 90 Science 75 Social 90
def flatten(l):
return [item for sublist in l for item in sublist]
new_df = pd.DataFrame()
index = flatten([["Subject " +str(i), "Mark "+ str(i) ] for i in range(1,6)])
for item in df.groupby('Name'):
l = (item[1]).iloc[:,1:].values.tolist()
flat_list = flatten(l)
s = pd.Series(data = flat_list, index= index, name = item[0])
new_df = pd.concat([new_df, s], axis=1)
new_df = new_df.T
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.