简体   繁体   English

如何计算 pandas dataframe 中组内的列中连续值的数量?

[英]How can I calculate number of consecutive values in a column within a group in a pandas dataframe?

I have a dataframe with all of a fighter's fights, the fight number (ie if it is their first, second, etc.), and whether or not they won the fight.我有一个 dataframe 与所有战斗机的战斗,战斗编号(即,如果它是他们的第一,第二等),以及他们是否赢得了战斗。 I would like to calculate the number of consecutive wins a fighter had gotten before their current fight (ie not including if they won the current fight).我想计算一个战士在他们当前的战斗之前获得的连续胜利次数(即不包括他们是否赢得了当前的战斗)。 I am currently working with Python 3.7 in Spyder.我目前正在 Spyder 中使用 Python 3.7。

Suppose we have the following dataframe, where win = 1 if the fighter won the fight:假设我们有以下 dataframe,如果战斗机赢得战斗,则 win = 1:

df = pd.DataFrame({'fighter' : ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'C', 'C'], 
                  'fight_number' :  ['1', '2', '3', '4', '1', '2', '3', '1', '2'],
                  'win' : [0, 0, 1, 1, 1, 1, 0, 1, 1]})
  fighter  fight_number  win
0       A             1     0
1       A             2     0
2       A             3     1
3       A             4     1
4       B             1     1
5       B             2     1
6       B             3     0
7       C             1     1
8       C             2     1

I know that to calculate win streaks across all rows, I can implement the solution proposed here with:我知道要计算所有行的连胜记录,我可以实施此处提出的解决方案:

grouper = (df.win != df.win.shift()).cumsum()
df['streak'] = df.groupby(grouper).cumsum()

which produces:产生:

  fighter fight_number  win  streak
0       A            1    0       0
1       A            2    0       0
2       A            3    1       1
3       A            4    1       2
4       B            1    1       3
5       B            2    1       4
6       B            3    0       0
7       C            1    1       1
8       C            2    1       2

But what I need is to apply this approach to subgroups of the dataframe (ie to each fighter) and to not include the outcome of the current fight in the count of the streak.但是我需要将这种方法应用于 dataframe 的子组(即每个战斗机),并且不将当前战斗的结果包括在连续计数中。 So, I am basically trying to have the current win streak of the fighter when they enter the fight.所以,我基本上是想在他们进入战斗时拥有战士目前的连胜纪录。

The target output in this example would therefore be:因此,此示例中的目标 output 将是:

  fighter fight_number  win  streak
0       A            1    0       0
1       A            2    0       0
2       A            3    1       0
3       A            4    1       1
4       B            1    1       0
5       B            2    1       1
6       B            3    0       2
7       C            1    1       0
8       C            2    1       1

I appreciate any advice I can get on this, as I am pretty new to Python.我很感激我能得到的任何建议,因为我对 Python 还是很陌生。

One solution I came up with was inspired by an earlier answer posted (but deleted) by jezrael :我提出的一个解决方案受到jezrael发布(但已删除)的早期答案的启发:

grouper = (df.win != df.win.shift()).cumsum()
df['streak'] = df.groupby(['fighter', grouper]).cumsum()
df['streak'] = df.groupby('fighter')['streak'].shift(1).fillna(0)

which produces the target output:生成目标 output:

  fighter fight_number  win  streak
0       A            1    0     0.0
1       A            2    0     0.0
2       A            3    1     0.0
3       A            4    1     1.0
4       B            1    1     0.0
5       B            2    1     1.0
6       B            3    0     2.0
7       C            1    1     0.0
8       C            2    1     1.0

and it also seems to work on other test examples:它似乎也适用于其他测试示例:

df2 = pd.DataFrame({'fighter' : ['A', 'A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'C', 'C'], 
                  'fight number' :  ["1", "2", "3", "4", "5", "6", "1", "2", "3", "1", "2"],
                  'win' : [1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1]}) 

grouper = (df2.win != df2.win.shift()).cumsum()
df2['streak'] = df2.groupby(['fighter', grouper]).cumsum()
df2['streak'] = df2.groupby('fighter')['streak'].shift(1).fillna(0)

   fighter fight number  win  streak
0        A            1    1     0.0
1        A            2    1     1.0
2        A            3    0     2.0
3        A            4    1     0.0
4        A            5    0     1.0
5        A            6    1     0.0
6        B            1    1     0.0
7        B            2    1     1.0
8        B            3    0     2.0
9        C            1    1     0.0
10       C            2    1     1.0

df = df.groupby(['fighter','fight_number','win'])['win'].sum().groupby(['fighter']).cumsum().reset_index(name='streak')

For some reason joe's answer didn't quite work, but this did:出于某种原因,乔的回答不太奏效,但确实如此:

df = df.sort_values(['fighter', 'date'])
grouper = (df.win != df.win.shift()).cumsum()
df['streak'] = df.groupby(['fighter', grouper])['win'].cumsum()
df.sort_index(inplace=True)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何计算 pandas dataframe 中每个数字的 3 个值的总和,包括第一个数字? - How can I calculate the sum of 3 values from each number in a pandas dataframe including the first number? 在 pandas dataframe 列上按上一个组值复制组内的值 - replcae values within a group on pandas dataframe column by previous group value 如何计算 pandas dataframe 中空值的百分比? - How can I calculate the percentage of empty values in a pandas dataframe? 在 pandas DataFrame 中对相同的连续值进行分组 - Group identical consecutive values in pandas DataFrame Pandas DataFrame:如何计算价格除以组类别的行数的新列? - Pandas DataFrame: How to calculate a new column with Price divided by number of lines of a group category? 如何根据 pandas dataframe 中的多列按元素分组并将每组的元素数量保存在另一列中? - How can I group by elements based on multiple columns in pandas dataframe and save the number of elements of each group in another column? 在Pandas数据框中计算连续数量的Null值 - Counting a consecutive number of Null Values in a Pandas Dataframe 如何在嵌套列集中过滤熊猫数据框? - How can I filter a pandas dataframe within a nested column set? 如何对 Dask 数据框组中的值进行排序? - How can I sort values within a Dask dataframe group? 如何在 Pandas Dataframe 中找到 5 个连续行,其中某个列的值至少为 0.5 - How can I find 5 consecutive rows in pandas Dataframe where a value of a certain column is at least 0.5
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM