简体   繁体   English

如何根据出现排列 Dataframe 值

[英]How to Arrange Dataframe values Based on the Occurrence

Suppose I have a Dataframe having the following column values:假设我有一个 Dataframe 具有以下列值:

Name    |   Subject     |   Mark
------------------------------------
Daniel  |   Maths       |   95
Sam     |   Science     |   98
Nathan  |   English     |   90
Hobbs   |   Social      |   85
Shaw    |   Language    |   90
Daniel  |   Social      |   99
Shaw    |   Science     |   75
Nathan  |   Maths       |   99
Sam     |   Language    |   70
Hobbs   |   Language    |   90
Shaw    |   Social      |   90
Nathan  |   Social      |   85
Daniel  |   English     |   90
Nathan  |   Science     |   85
Hobbs   |   English     |   85
Nathan  |   Language    |   90
Daniel  |   Science     |   98
Sam     |   Social      |   85
Shaw    |   Maths       |   95
Daniel  |   Language    |   95
Sam     |   Maths       |   99
Hobbs   |   Science     |   99
Sam     |   English     |   75
Shaw    |   English     |   90
Hobbs   |   Maths       |   85

I need to create a code which would transform the dataframe into the following:我需要创建一个代码,它将 dataframe 转换为以下内容:

Name    |   Subject 1   |   Mark 1  |   Subject 2   |   Mark 2  |   Subject 3   |   Mark 3  |   Subject 4   |   Mark 4  |   Subject 5   |   Mark 5
----------------------------------------------------------------------------------------------------------------------------------------------------
Daniel  |   Maths       |   95      |   Social      |   99      |   English     |   90      |   Science     |   98      |   Language    |   95
Sam     |   Science     |   98      |   Language    |   70      |   Social      |   85      |   Maths       |   99      |   English     |   75
Nathan  |   English     |   90      |   Maths       |   99      |   Social      |   85      |   Science     |   85      |   Language    |   90
Hobbs   |   Social      |   85      |   Language    |   90      |   English     |   85      |   Science     |   99      |   Maths       |   85
Shaw    |   Language    |   90      |   Science     |   75      |   Social      |   90      |   Maths       |   95      |   English     |   90

What is done here is that for each name, the subjects are arranged in the order of their occurrence in the original dataframe in columns.这里所做的是,对于每个名称,主题按照它们在原始 dataframe 中出现的顺序排列在列中。 What python code can achieve this?什么 python 代码可以实现这个?

Edit#1: There are other columns like Reg.No, Class, etc. which is just unique value for each student.编辑#1:还有其他列,例如 Reg.No、Class 等,这对每个学生来说都是唯一的价值。 So I need a solution which has the above mentioned columns along with these.所以我需要一个解决方案,其中包含上述列以及这些列。

Try pivot :尝试pivot

In [1415]: df.pivot('Name', 'Subject', 'Mark')
Out[1415]: 
Subject  English  Language  Maths  Science  Social
Name                                              
Daniel        90        95     95       98      99
Hobbs         85        90     85       99      85
Nathan        90        90     99       85      85
Sam           75        70     99       98      85
Shaw          90        90     95       75      90

A little convoluted, but should do:有点复杂,但应该这样做:

>>> (df.groupby('Name', sort=False)[['Subject', 'Mark']]
      .apply(lambda x: x.stack().reset_index(drop=True))
      .rename(columns=lambda x: f"{['Subject', 'Mark'][x%2]} {x//2+1}")
    )
       Subject 1  Mark 1 Subject 2  Mark 2 Subject 3  Mark 3 Subject 4  \
Name                                                                     
Daniel     Maths      95    Social      99   English      90   Science   
Sam      Science      98  Language      70    Social      85     Maths   
Nathan   English      90     Maths      99    Social      85   Science   
Hobbs     Social      85  Language      90   English      85   Science   
Shaw    Language      90   Science      75    Social      90     Maths   

        Mark 4 Subject 5  Mark 5  
Name                              
Daniel      98  Language      95  
Sam         99   English      75  
Nathan      85  Language      90  
Hobbs       99     Maths      85  
Shaw        95   English      90   

EDIT:编辑:

If you want n subjects:如果你想要n主题:

>>> n = 3
>>> (df.groupby('Name', sort=False)[['Subject', 'Mark']]
      .apply(lambda x: x.iloc[:n].stack().reset_index(drop=True))
      .rename(columns=lambda x: f"{['Subject', 'Mark'][x%2]} {x//2+1}")
    )

       Subject 1  Mark 1 Subject 2  Mark 2 Subject 3  Mark 3
Name                                                        
Daniel     Maths      95    Social      99   English      90
Sam      Science      98  Language      70    Social      85
Nathan   English      90     Maths      99    Social      85
Hobbs     Social      85  Language      90   English      85
Shaw    Language      90   Science      75    Social      90
def flatten(l):
    return [item for sublist in l for item in sublist]

new_df = pd.DataFrame()
index = flatten([["Subject " +str(i), "Mark "+ str(i) ] for i in range(1,6)])
for item in df.groupby('Name'):
    l = (item[1]).iloc[:,1:].values.tolist()
    flat_list = flatten(l)
    s = pd.Series(data = flat_list, index= index, name = item[0])
    new_df = pd.concat([new_df, s], axis=1)
    
new_df = new_df.T

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM