I have a pandas Df with 1.2 million rows *10 columns.
Index Time abc 0 1 0 1 0 1 2 0 0 1 2 3 0.3 0 1.5 3 4 0 1 0 4 5 0 0 5 5 6 1 0 0 6 7 0 0 0 7 8 0 1 5
I would like to eliminate rows of the data frame that are BEFORE the first non-zero index of column "a" AND AFTER the last non-zero index of column "a". In the case above the results should look like this:
Index Time abc 0 3 0.3 0 1.5 1 4 0 1 0 2 5 0 0 5 3 6 1 0 0
I found the same question posted Same requirement , But there he used R to do the operation... How can I do it in python ????
First compare column a
for not equal by ne
, then get cumulative sum, and compare again, create another mask by change order by [::-1]
for swap order and last filter by boolean indexing
:
m = df['a'].ne(0)
df = df[m.cumsum().ne(0) & m[::-1].cumsum().ne(0)]
print (df)
Time a b c
2 3 0.3 0 1.5
3 4 0.0 1 0.0
4 5 0.0 0 5.0
5 6 1.0 0 0.0
Solution working nice if only 0
values in column a
:
print (df)
Time a b c
0 1 0 1 0
1 2 0 0 1
6 7 0 0 0
7 8 0 1 5
m = df['a'].ne(0)
df = df[m.cumsum().ne(0) & m[::-1].cumsum().ne(0)]
print (df)
Empty DataFrame
Columns: [Time, a, b, c]
Index: []
Just another method using df.iloc[]
m=df[df.a.ne(0)]
df.iloc[m.index[0]:m.index[1]+1]
Index Time a b c
2 2 3 0.3 0 1.5
3 3 4 0.0 1 0.0
4 4 5 0.0 0 5.0
5 5 6 1.0 0 0.0
Let's use first_valid_index
and last_valid_index
with mask
:
mask = df2['a'].mask(df2['a'] == 0)
start = mask.first_valid_index()
end = mask.last_valid_index()
df2.loc[start:end]
Output:
Time a b c
Index
2 3 0.3 0 1.5
3 4 0.0 1 0.0
4 5 0.0 0 5.0
5 6 1.0 0 0.0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.