I'm using the pandas library in Python.
I have a data frame:
0 1 2 3 4
0 0 0 0 1 0
1 0 0 0 0 1
2 0 0 1 0 0
3 1 0 0 0 0
4 0 0 1 0 0
5 0 1 0 0 0
6 1 0 0 1 1
Is it possible to create a new column that is a count of the number of cells that are empty between the end of the row and the last value above zero? Example data frame below:
0 1 2 3 4 Value
0 0 0 0 1 0 1
1 0 0 0 0 1 0
2 0 0 1 0 0 2
3 1 0 0 0 0 4
4 0 0 1 0 0 2
5 0 1 0 0 0 3
6 1 0 0 1 1 0
Use:
df['new'] = df.iloc[:, ::-1].cumsum(axis=1).eq(0).sum(axis=1)
print (df)
0 1 2 3 4 new
0 0 0 0 1 0 1
1 0 0 0 0 1 0
2 0 0 1 0 0 2
3 1 0 0 0 0 4
4 0 0 1 0 0 2
5 0 1 0 0 0 3
6 1 0 0 1 1 0
Details :
First change order of columns by DataFrame.loc
and slicing:
print (df.iloc[:, ::-1])
4 3 2 1 0
0 0 1 0 0 0
1 1 0 0 0 0
2 0 0 1 0 0
3 0 0 0 0 1
4 0 0 1 0 0
5 0 0 0 1 0
6 1 1 0 0 1
Then use cumulative sum per rows by DataFrame.cumsum
:
print (df.iloc[:, ::-1].cumsum(axis=1))
4 3 2 1 0
0 0 1 1 1 1
1 1 1 1 1 1
2 0 0 1 1 1
3 0 0 0 0 1
4 0 0 1 1 1
5 0 0 0 1 1
6 1 2 2 2 3
Compare only 1
values by DataFrame.eq
:
print (df.iloc[:, ::-1].cumsum(axis=1).eq(0))
4 3 2 1 0
0 True False False False False
1 False False False False False
2 True True False False False
3 True True True True False
4 True True False False False
5 True True True False False
6 False False False False False
And last count them per rows by sum
:
print (df.iloc[:, ::-1].cumsum(axis=1).eq(0).sum(axis=1))
0 1
1 0
2 2
3 4
4 2
5 3
6 0
dtype: int64
using argmax
df['value'] = df.apply(lambda x: (x.iloc[::-1] == 1).argmax(),1)
##OR
using np.where
df['Value'] = np.where(df.iloc[:,::-1] == 1,True,False).argmax(1)
0 1 2 3 4 Value
0 0 0 0 1 0 1
1 0 0 0 0 1 0
2 0 0 1 0 0 2
3 1 0 0 0 0 4
4 0 0 1 0 0 2
5 0 1 0 0 0 3
6 1 0 0 1 1 0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.