I am trying to create a new column in Python where the column value is conditional on a different column as well as depended on the previous row of he same column in the dataframe. The new column can be interpreted as an incremental time period that restarts with a new data field.
My desired output is: if the data field is equal to the previous data field, the new column is equal to 1. If not, the new column value is previous row value + 1.
In Excel, the formula looks like the below:
=IF(A2=A1,C1+1,1)
Below is my data:
Data Random_Columns
A Random
A Random
A Random
A Random
B Random
B Random
B Random
B Random
B Random
B Random
C Random
C Random
C Random
Below is how I want my new column to look like:
Data Random_Columns New_Column
A Random 1
A Random 2
A Random 3
A Random 4
B Random 1
B Random 2
B Random 3
B Random 4
B Random 5
B Random 6
C Random 1
C Random 2
C Random 3
Every time the sorted dataframe starts a new different value, the new column should refresh and restart its incremental counter from 1.
From other questions, I believe that we could be using the "shift" function, but have not been successful in getting the desired output.
try this, Create a NewCol
with default value followed by DataFrame.groupby
, Series.cumsum
on each group.
df['NewCol'] = (
df.assign(NewCol=1).groupby('Data').transform('cumsum')
)
Data NewCol
0 A 1
1 A 2
2 A 3
3 A 4
4 B 1
5 B 2
6 B 3
7 B 4
8 B 5
9 B 6
10 C 1
11 C 2
12 C 3
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.