[英]pandas strictly increasing column group by
dates=['2020-12-01','2020-12-03','2020-12-04', '2020-12-01','2020-12-03','2020-12-04']
symbols=['ABC','ABC','ABC','DEF','DEF','DEF']
v=[1,3,5,7,9,8]
df= pd.DataFrame({'date':dates, 'g':symbols, 'v':v})
date g v
0 2020-12-01 ABC 1
1 2020-12-03 ABC 3
2 2020-12-04 ABC 5
3 2020-12-01 DEF 7
4 2020-12-03 DEF 9
5 2020-12-04 DEF 8
I want to create new dataframe group by 'g' and tell me whether it is strictly increasing or not.
For example, output
g increasing
0 ABC 1
1 DEF 0
since ABC is always increasing whereas DEF is not.
I thought maybe I can use diff() and then select for names that have negative values.我想也许我可以使用 diff() 然后 select 用于具有负值的名称。 (These names I can exclude from list) But I lose grouping column when I use this function:
(我可以从列表中排除这些名称)但是当我使用这个 function 时,我丢失了分组列:
df.groupby(by='g')['v'].diff()
0 NaN
1 2.0
2 2.0
3 NaN
4 2.0
5 -1.0
What is the best way to do this?做这个的最好方式是什么?
The following looks good but is NOT want I want (Since it returns true even if value stays the same)以下看起来不错,但不是我想要的(因为即使值保持不变,它也会返回 true)
>>> df.groupby(by='g')['v'].is_monotonic_increasing.reset_index()
g v
0 ABC True
1 DEF False
You just need to check if is monotonic increasing and all the elements are unique.您只需要检查是否是单调递增的并且所有元素都是唯一的。 For this you could use pandas is_monotonic_increasing and unique :
为此,您可以使用 pandas is_monotonic_increasing和unique :
res = df.groupby('g', as_index=False)['v'].apply(lambda x: len(x) == len(x.unique()) and x.is_monotonic_increasing)
print(res)
Output Output
g
ABC True
DEF False
Name: v, dtype: bool
As an alternative use duplicated to check if all the values are unique:作为替代使用duplicated来检查所有值是否都是唯一的:
res = df.groupby('g', as_index=False)['v'].apply(lambda x: (~x.duplicated()).all() and x.is_monotonic_increasing)
print(res)
Output Output
g v
0 ABC True
1 DEF False
A third alternative is to use numpy and verify all the differences between consecutives elements are greater than 0:第三种选择是使用 numpy 并验证连续元素之间的所有差异都大于 0:
res = df.groupby('g', as_index=False)['v'].apply(lambda x: np.all(np.diff(x) > 0))
Thank you Dani for answer.谢谢丹妮的回答。 I had to make a small change to make 'g' column appear.
我必须做一个小改动才能使“g”列出现。
df.groupby('g', as_index=True)['v'].apply(lambda x: len(x) == len(x.unique()) and x.is_monotonic_increasing).reset_index()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.