[英]pandas/python: creating a numerical categorical variable that counts the categories
I am trying to build a column in a pandas DF that is counting the category CHANGES of a categorical variable in a "rolling" way.我正在尝试在 pandas DF 中构建一个列,该列以“滚动”方式计算分类变量的类别变化。 What I keep on finding in stackoverflow is a number of rolling counts, which is exactly the opposite of what I am looking for.
我在 stackoverflow 中不断发现的是一些滚动计数,这与我正在寻找的正好相反。 I am looking for a column that runs through an alphabetically sorted categorical column and adds an increment of 1 every time the category changes but gets dragged unchanged otherwise.
我正在寻找一个列,该列贯穿按字母顺序排序的分类列,并且每次类别更改时都会增加 1,否则会被拖拽而保持不变。 So if I have the variable named 'cat_var' in the example below, I need to programmatically create the column 'category_counter_var' which I manually created in the example below.
因此,如果我在下面的示例中有名为“cat_var”的变量,我需要以编程方式创建我在下面的示例中手动创建的列“category_counter_var”。 Can someone help?
有人可以帮忙吗?
import pandas as pd
df = pd.DataFrame({'cat_var':['Q1','Q1','Q1','Q2','Q2','Q3','Q4','Q4','Q4','Q4']
,'category_counter_var':[1,1,1,2,2,3,4,4,4,4]})
Use:利用:
df['new'] = df['cat_var'].ne(df['cat_var'].shift()).cumsum()
print(df)
# Output
cat_var category_counter_var new
0 Q1 1 1
1 Q1 1 1
2 Q1 1 1
3 Q2 2 2
4 Q2 2 2
5 Q3 3 3
6 Q4 4 4
7 Q4 4 4
8 Q4 4 4
9 Q4 4 4
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.