简体   繁体   English

Pandas数据框:将列扩展为行,再加上增量编号

[英]Pandas dataframe: Expanding column into rows plus incremental numbering

I need to expand a single row of my Pandas dataframe intwo multiple rows based on splitting the score column (containing match results) based on the spaces. 我需要基于空格将score列(包含匹配结果)拆分为基础,将Pandas数据框的单行扩展为两行。

This is what the data looks like 这就是数据的样子

A   B   score
1   2   6-1 6-2
3   4   6-4 4-6 6-3

To achieve the goal I used the approach from here . 为了实现这个目标,我从这里使用了这种方法。

With slightly adapting the approach, my dataframe looks like this: 通过稍微调整方法,我的数据框如下所示:

A   B   score           sets
1   2   6-1 6-2         6-1
1   2   6-1 6-2         6-2
3   4   6-4 4-6 6-3     6-4
3   4   6-4 4-6 6-3     4-6
3   4   6-4 4-6 6-3     6-3

However, I would also like to have another additional column which represents the number of the set per match. 但是,我还想再增加一列,代表每场比赛的盘数。 It is like a cumulative count of the sets per match. 就像每场比赛的总累积数一样。 My question is, how can the above linked solution be changed in order to get the desired result which looks as follows: 我的问题是,如何更改上面的链接解决方案以获得所需的结果,如下所示:

A   B   score           sets    setnumber
1   2   6-1 6-2         6-1     1
1   2   6-1 6-2         6-2     2
3   4   6-4 4-6 6-3     6-4     1
3   4   6-4 4-6 6-3     4-6     2
3   4   6-4 4-6 6-3     6-3     3

I think somewhere in the following code lines an adaption needs to be done, but I couldn't figure out, yet, how it should work: 我认为以下代码行中的某处需要进行适应,但是我仍然无法弄清楚它应该如何工作:

s = df['score'].str.split(' ').apply(pd.Series, 1).stack()
s.index = s.index.droplevel(-1) # to line up with df's index

You can use repeat and then cumcount 您可以使用repeat然后cumcount

In [2915]: dff = df.set_index(['A', 'B'])['score'].repeat(
                            df['score'].str.split(' ').str.len()
                                 ).reset_index()

In [2916]: dff
Out[2916]:
   A  B        score
0  1  2      6-1 6-2
1  1  2      6-1 6-2
2  3  4  6-4 4-6 6-3
3  3  4  6-4 4-6 6-3
4  3  4  6-4 4-6 6-3

In [2917]: dff.assign(setnumber=dff.groupby(['A', 'B']).cumcount()+1)
Out[2917]:
   A  B        score  setnumber
0  1  2      6-1 6-2          1
1  1  2      6-1 6-2          2
2  3  4  6-4 4-6 6-3          1
3  3  4  6-4 4-6 6-3          2
4  3  4  6-4 4-6 6-3          3

You could also get dff with .loc 您也可以使用.loc获得dff

In [2923]: df.loc[df.index.repeat(df['score'].str.split(' ').str.len())]
Out[2923]:
   A  B        score
0  1  2      6-1 6-2
0  1  2      6-1 6-2
1  3  4  6-4 4-6 6-3
1  3  4  6-4 4-6 6-3
1  3  4  6-4 4-6 6-3

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM