[英]Sort and Slice DataFrame in Pandas
I have a dataframe like given below: 我有一个如下所示的数据框:
detaildate detailquantity
0 2012-02-09 7.0
1 2011-05-27 -1.0
2 2011-05-04 -2.0
3 2012-03-19 -2.0
4 2012-03-18 -3.0
I want to first sort the above dataframe by detaildate
and then slice the dataframe from the first positive value of detailquantity
to last index. 我想首先按detaildate
对上面的数据detaildate
进行排序,然后将数据detaildate
从detailquantity
的第一个正值detailquantity
到最后一个索引。
The result dataframe should look like this: 结果数据帧应如下所示:
detaildate detailquantity
0 2012-02-09 7.0
4 2012-03-18 -3.0
3 2012-03-19 -2.0
I am trying the below code but it is resulting in an empty dataframe at the end and I am not able to figure out why 我正在尝试下面的代码,但是最后导致一个空的数据框,我无法弄清楚为什么
df.sort_values(by='detaildate', inplace=True)
df = df[df[df['detailquantity'] > 0].first_valid_index():]
What is wrong with the above code? 上面的代码有什么问题?
Use Series.cumsum
with boolean mask and test all values greater like 0
, solution also working correctly if all negative values: 使用带有布尔掩码的Series.cumsum
并测试所有大于0
值,如果所有负值,解决方案也可以正常工作:
df.sort_values(by='detaildate', inplace=True)
df = df[(df['detailquantity'] > 0).cumsum() > 0]
print (df)
detaildate detailquantity
0 2012-02-09 7.0
4 2012-03-18 -3.0
3 2012-03-19 -2.0
Your solution should be changed by creating unique index, but is necessary at least one value matched: 应该通过创建唯一索引来更改您的解决方案,但必须至少匹配一个值:
df.sort_values(by='detaildate', inplace=True)
df = df.reset_index(drop=True)
df = df.loc[(df['detailquantity'] > 0).idxmax():]
print (df)
detaildate detailquantity
2 2012-02-09 7.0
3 2012-03-18 -3.0
4 2012-03-19 -2.0
Another alternative in numpy: numpy中的另一种选择:
df.sort_values(by='detaildate', inplace=True)
df = df.iloc[(df['detailquantity'].values > 0).argmax():]
print (df)
detaildate detailquantity
0 2012-02-09 7.0
4 2012-03-18 -3.0
3 2012-03-19 -2.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.