There are similar questions to this one but what I am really asking is a bit different.
I want to know whether there is a way to implement the below code without a for loop (with map, or a columnar calculation ) if possible or fastest way possible.
I have a DataFrame(df) with m rows(>1E7) and n columns. Column j+1 is initiated with all 1s or 0s.
for i in range(len(df)):
if df.iloc[i, j] == df.iloc[i-1, j]:
df.iloc[i, j+1] = df.iloc[i-1, j+1]+1
So the example output will look like:
... j j+1 ...
0 ... 3 1 ...
1 ... 4 1 ...
2 ... 4 2 ...
3 ... 4 3 ...
4 ... 6 1 ...
5 ... 6 2 ...
6 ... 7 1 ...
There are definitely questions that answers this:
s = df.iloc[:,j]
blocks = s.ne(s.shift()).cumsum()
df.iloc[:,j+1]= s.groupby(blocks).cumcount() + 1
Output:
... j j+1 ...
0 ... 3 1 ...
1 ... 4 1 ...
2 ... 4 2 ...
3 ... 4 3 ...
4 ... 6 1 ...
5 ... 6 2 ...
6 ... 7 1 ...
Sounds like this is what you're after.
df['j+1'] = df.groupby('j').cumcount() + 1
Output:
j j+1
0 3 1
1 4 1
2 4 2
3 4 3
4 6 1
5 6 2
6 7 1
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.