[英]Add column with numbers based on count of value in other column in Pandas
colA
is what I currently have. colA
是我目前拥有的。
However, I'm trying to generate colB
.但是,我正在尝试生成colB
。
I want colB
to contain the number 001
for each value.我希望colB
包含每个值的数字001
。 However if the associated colA
value exists twice in that column, I want the colB
number to then be 002
, and so on.但是,如果关联的colA
值在该列中存在两次,我希望colB
编号为002
,依此类推。
Hopefully the example below gives a better idea of what I'm looking for based on the colA
values.希望下面的示例能够根据colA
值更好地了解我正在寻找的colA
。 I've been struggling to put together any real code for this.我一直在努力为此编写任何真正的代码。
EDIT: Struggling to explain this in words, so if you can think of a better way to explain it feel free to update my question.编辑:努力用文字解释这一点,所以如果你能想到更好的解释方式,请随时更新我的问题。
colA colB
BJ02 001
BJ02 002
CJ02 001
CJ03 001
CJ02 002
DJ01 001
DJ02 001
DJ07 001
DJ07 002
DJ07 003
You can use Counter() to count the frequency of each value in colA, then create a function to generate a list of values for colB.您可以使用 Counter() 来计算 colA 中每个值的频率,然后创建一个函数来生成 colB 的值列表。
from collections import Counter
def count_value(colA):
new_col = []
colA = df[colA].tolist()
freq_table = Counter(colA) # count the frequency of each value
for value in colA:
new_col.append('00' + str(freq_table[value]))
return new_col
df['colB'] = count_value(df['colA'])
Use groupby_cumcount
:使用groupby_cumcount
:
df['colB'] = df.groupby('colA').cumcount().add(1)
print(df)
# Output
colA colB
0 BJ02 1
1 BJ02 2
2 CJ02 1
3 CJ03 1
4 CJ02 2
5 DJ01 1
6 DJ02 1
7 DJ07 1
8 DJ07 2
9 DJ07 3
Suggested by @HenryEcker, use zfill
:由@HenryEcker 建议,使用zfill
:
df['colB'] = df.groupby('colA').cumcount().add(1).astype(str).str.zfill(3)
print(df)
# Output:
colA colB
0 BJ02 001
1 BJ02 002
2 CJ02 001
3 CJ03 001
4 CJ02 002
5 DJ01 001
6 DJ02 001
7 DJ07 001
8 DJ07 002
9 DJ07 003
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.