I have a dataframe like given below:
detaildate detailquantity
0 2012-02-09 7.0
1 2011-05-27 -1.0
2 2011-05-04 -2.0
3 2012-03-19 -2.0
4 2012-03-18 -3.0
I want to first sort the above dataframe by detaildate
and then slice the dataframe from the first positive value of detailquantity
to last index.
The result dataframe should look like this:
detaildate detailquantity
0 2012-02-09 7.0
4 2012-03-18 -3.0
3 2012-03-19 -2.0
I am trying the below code but it is resulting in an empty dataframe at the end and I am not able to figure out why
df.sort_values(by='detaildate', inplace=True)
df = df[df[df['detailquantity'] > 0].first_valid_index():]
What is wrong with the above code?
Use Series.cumsum
with boolean mask and test all values greater like 0
, solution also working correctly if all negative values:
df.sort_values(by='detaildate', inplace=True)
df = df[(df['detailquantity'] > 0).cumsum() > 0]
print (df)
detaildate detailquantity
0 2012-02-09 7.0
4 2012-03-18 -3.0
3 2012-03-19 -2.0
Your solution should be changed by creating unique index, but is necessary at least one value matched:
df.sort_values(by='detaildate', inplace=True)
df = df.reset_index(drop=True)
df = df.loc[(df['detailquantity'] > 0).idxmax():]
print (df)
detaildate detailquantity
2 2012-02-09 7.0
3 2012-03-18 -3.0
4 2012-03-19 -2.0
Another alternative in numpy:
df.sort_values(by='detaildate', inplace=True)
df = df.iloc[(df['detailquantity'].values > 0).argmax():]
print (df)
detaildate detailquantity
0 2012-02-09 7.0
4 2012-03-18 -3.0
3 2012-03-19 -2.0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.