简体   繁体   English

在熊猫数据框中添加指示计数的新列

[英]Add new column indicating count in a pandas dataframe

I have a dataframe with some replicated rows我有一个包含一些复制行的数据框

item h2 h3  h4
----------------
foo  v1 ... ...
foo  v2 ... ...
foo  v1 ... ...
foo  v2 ... ...
foo  v1 ... ...
foo  v2 ... ...
foo  v1 ... ...
foo  v2 ... ...
bar  v5 ... ...
bar  v6 ... ...
bar  v7 ... ...
bar  v5 ... ...
bar  v6 ... ...
bar  v7 ... ...

My goal is to add a column ( new_id ) in this dataframe which indicates an incrementing count of duplicate blocks (block being a set of rows that have the same item name) prefixed with the value in the item column (if it helps, the replicated blocks will be consecutive)我的目标是在此数据框中添加一列( new_id ),该列指示重复块(块是具有相同item名称的一组行)的递增计数,并以item列中的值作为前缀(如果有帮助,复制的块将是连续的)

item h2 h3  h4   new_id
-----------------------
foo  v1 ... ...  foo1
foo  v2 ... ...  foo1
foo  v1 ... ...  foo2
foo  v2 ... ...  foo2
foo  v1 ... ...  foo3
foo  v2 ... ...  foo3
foo  v1 ... ...  foo4
foo  v2 ... ...  foo4
bar  v5 ... ...  bar1
bar  v6 ... ...  bar1
bar  v7 ... ...  bar1
bar  v5 ... ...  bar2
bar  v6 ... ...  bar2
bar  v7 ... ...  bar2

Suggestions on how to accomplish this?关于如何实现这一点的建议?

Use GroupBy.cumcount by both columns item and h2 :通过itemh2列使用GroupBy.cumcount

df['new_id'] = df['item'] + '_' + df.groupby(['item','h2']).cumcount().add(1).astype(str)
print (df)
   item  h2   h3   h4 new_id
0   foo  v1  ...  ...  foo_1
1   foo  v2  ...  ...  foo_1
2   foo  v1  ...  ...  foo_2
3   foo  v2  ...  ...  foo_2
4   foo  v1  ...  ...  foo_3
5   foo  v2  ...  ...  foo_3
6   foo  v1  ...  ...  foo_4
7   foo  v2  ...  ...  foo_4
8   bar  v5  ...  ...  bar_1
9   bar  v6  ...  ...  bar_1
10  bar  v7  ...  ...  bar_1
11  bar  v5  ...  ...  bar_2
12  bar  v6  ...  ...  bar_2
13  bar  v7  ...  ...  bar_2

Use str.cat() to concat column item with the cummulative count of each group in h2 .使用str.cat()将列itemh2中每个组的累积计数连接起来。 Obviously the cummulative count begins from zero, offset it by 1显然累积计数从零开始,将其偏移 1

df.item.str.cat((df.groupby('h2').cumcount()+1).astype(str),sep='')



  item  h2   h3   h4 new_id
0   foo  v1  ...  ...   foo1
1   foo  v2  ...  ...   foo1
2   foo  v1  ...  ...   foo2
3   foo  v2  ...  ...   foo2
4   foo  v1  ...  ...   foo3
5   foo  v2  ...  ...   foo3
6   foo  v1  ...  ...   foo4
7   foo  v2  ...  ...   foo4
8   bar  v5  ...  ...   bar1
9   bar  v6  ...  ...   bar1
10  bar  v7  ...  ...   bar1
11  bar  v5  ...  ...   bar2
12  bar  v6  ...  ...   bar2
13  bar  v7  ...  ...   bar2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 通过计算条件概率表示偏好的新列 Pandas Dataframe - New column indicating preference by calculating conditional probability in Pandas Dataframe Pandas dataframe 创建新列,指示其他列中的重叠值 - Pandas dataframe create new column indicating overlapping values in other columns 创建一个新列作为 Pandas DataFrame 的计数 - Create a new column as a count of the Pandas DataFrame 熊猫中是否有一种方法可以在一个数据帧中计数(Excel中的Countifs)并在另一个长度不同的数据帧中将计数添加为新列? - Is there a way in Pandas to count (Countifs in excel) in one dataframe and add counts as new column in another dataframe of different length? 将重复计数列添加到熊猫数据框 - Add repeat count column to a pandas dataframe Pandas dataframe,如何按多列分组并为特定列应用总和并添加新的计数列? - Pandas dataframe, how can I group by multiple columns and apply sum for specific column and add new count column? 将计数添加到新列pandas python 3 - Add the count to new column pandas python 3 将新列添加到pandas数据帧的有效方法 - Efficient way to add new column to pandas dataframe 如何将值添加到熊猫数据框中的新列? - How to add values to a new column in pandas dataframe? 如何将新列添加到现有 pandas dataframe - How to add a new column to an existing pandas dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM