I am trying to filter a dataset in Pandas. The number must always increase, although this can be in irregular steps. I have set up a filter to ensure any values that are smaller than their predecessor are removed from the DataFrame. This is a simple example I am working with:
test = {"Test": [1, 3, 5, 7, 9, 2, 11, 4, 13]}
df = pd.DataFrame(test)
df = df[df.Test.shift() + 1 < df.Test]
This works, with the exception that it is also dropping 0
index. ie the output:
Test
1 3
2 5
3 7
4 9
6 11
8 13
is missing row 0 1
Any ideas how to get this row in as well?
Try fillna
with a value that would make the condition true:
df = df[df.Test.shift().fillna(df.Test - 1) < df.Test]
df
:
Test
0 1
1 3
2 5
3 7
4 9
6 11
8 13
Sample DataFrame that shows intermediate steps:
pd.DataFrame({
'shifted': df.Test.shift(),
'test': df.Test,
'condition': df.Test.shift() < df.Test,
'shifted then filled': df.Test.shift().fillna(df.Test - 1),
'fixed condition': df.Test.shift().fillna(df.Test - 1) < df.Test
})
shifted test condition shifted then filled fixed condition
NaN 1 False 0.0 True
1.0 3 True 1.0 True
3.0 5 True 3.0 True
5.0 7 True 5.0 True
7.0 9 True 7.0 True
9.0 2 False 9.0 False
2.0 11 True 2.0 True
11.0 4 False 11.0 False
4.0 13 True 4.0 True
This issue is that in the first case, NaN
is not less than 1 ( NaN < 1
=> False
).
So try with
df[~(df.Test.diff()<0)]
Test
0 1
1 3
2 5
3 7
4 9
6 11
8 13
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.