简体   繁体   English

如何使用递增的序列 ID 创建一个新的 pandas 列,但在每个组中保留相同的值

[英]How to create a new pandas column with increasing sequence id, but retain same value within each group

I have a pandas dataframe that looks like the one below:我有一个 pandas dataframe,如下所示:

df=pd.DataFrame({'hourOfDay':[5,5,8,8,13,13],
                 'category':['pageA','pageB','pageA','pageB','pageA','pageB'],
                })

    hourOfDay   category
0   5           pageA
1   5           pageB
2   8           pageA
3   8           pageB
4   13          pageA
5   13          pageB

Now, what I want is to create a new column with a monotonically increasing id.现在,我想要的是创建一个具有单调递增 id 的新列。 This id should be having same value within a group (hourOfDay).此 ID 在组内应具有相同的值 (hourOfDay)。 I'm giving the example of the expected dataframe below.我在下面给出了预期的 dataframe 的示例。

    hourOfDay   category    index
0           5   pageA       1
1           5   pageB       1
2           8   pageA       2
3           8   pageB       2
4          13   pageA       3
5          13   pageB       3

For now, we can assume that the category column can have only two values for simplicity, but it can be extended later.现在,为简单起见,我们可以假设类别列只能有两个值,但以后可以扩展它。 If I group by the hourOfDay, each separate page category within that group should get the same value assigned to it.如果我按 hourOfDay 分组,则该组中的每个单独的页面类别都应获得分配给它的相同值。 I can do it by making two separate dataframe out of the main dataframe (filtered by category), sort it and create a new column using the df.groupby("hourOfDay").cumcount() method and then finally merge the two dataframe. But this approach seems way too convoluted.我可以通过从主要 dataframe(按类别过滤)中创建两个单独的 dataframe 来实现,对其进行排序并使用df.groupby("hourOfDay").cumcount()方法创建一个新列,然后最终合并两个 dataframe。但是这种方法似乎太复杂了。 So, I was wondering if there's a simpler way of achieving the same thing.所以,我想知道是否有更简单的方法来实现同样的事情。

Try:尝试:

>>> df['index'] = df['hourOfDay'].eq(df['hourOfDay'].shift(-1)).cumsum()
>>> df
  hourOfDay category  index
0         5    pageA      1
1         5    pageB      1
2         8    pageA      2
3         8    pageB      2
4        13    pageA      3
5        13    pageB      3
>>> 

Use eq and shift to determine whether the current value is the same as the previous value, then use cumsum to cumulatively sum up the True s and False s.使用eqshift判断当前值是否与之前的值相同,然后使用cumsumTrueFalse累加起来。

If need same index per hourOfDay use GroupBy.ngroup :如果每个hourOfDay需要相同的index ,请使用GroupBy.ngroup

df['index'] = df.groupby('hourOfDay', sort=True).ngroup() + 1

Or factorize :factorize

df = df.sort_values('hourOfDay')
df['index'] = pd.factorize(df['hourOfDay'])[0] + 1

Use diff and cumsum :使用diffcumsum

df['index'] = df['hourOfDay'].diff().ne(0).cumsum()
print(df)

# Output:
  hourOfDay category  index
0         5    pageA      1
1         5    pageB      1
2         8    pageA      2
3         8    pageB      2
4        13    pageA      3
5        13    pageB      3

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python Pandas 旋转:如何在第一列中分组并为第二列中的每个唯一值创建一个新列 - Python Pandas pivoting: how to group in the first column and create a new column for each unique value from the second column 如何根据 pandas DataFrame 中同一组内的先前值创建列? - How to create column based on previous value within the same group in pandas DataFrame? 熊猫:对组中列的下一行(1…n)行进行滚动求和,并为每个和创建一个新列 - Pandas: Take rolling sum of next (1 … n) rows of a column within a group and create a new column for each sum Pandas:在同一 ID/组内创建具有另一列前 n 行滚动总和的列 - Pandas: Create column with rolling sum of previous n rows of another column for within the same id/group Python pandas:根据组内的最大值创建新列,但使用来自附加(字符串)列的值 - Python pandas: create new column based on max value within group, but using value from additional (string) column 如何为熊猫中的列中的每个逗号分隔值创建一个新行 - How to create a new row for each comma separated value in a column in pandas pandas 按数据框列分组,如果组内存在特定值,则创建新列 - pandas groupby data frame column and create new column if particular value exist within the group 按 boolean 变量分组,并为每组 pandas 的结果创建一个新列 - Group by a boolean variable and create a new column with the result for each group pandas 如何为每组熊猫列创建子图 - How to create a subplot for each group of a pandas column Pandas:在每个组内创建一个带有条件的新行 - Pandas: Create a new row within each group with conditions
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM