简体   繁体   English

pandas 严格增加列组

[英]pandas strictly increasing column group by

dates=['2020-12-01','2020-12-03','2020-12-04', '2020-12-01','2020-12-03','2020-12-04']
symbols=['ABC','ABC','ABC','DEF','DEF','DEF']
v=[1,3,5,7,9,8]

df= pd.DataFrame({'date':dates, 'g':symbols, 'v':v})

         date    g  v
0  2020-12-01  ABC  1
1  2020-12-03  ABC  3
2  2020-12-04  ABC  5
3  2020-12-01  DEF  7
4  2020-12-03  DEF  9
5  2020-12-04  DEF  8

I want to create new dataframe group by 'g' and tell me whether it is strictly increasing or not.
For example, output
    g increasing
0    ABC 1
1    DEF 0
since ABC is always increasing whereas DEF is not.

I thought maybe I can use diff() and then select for names that have negative values.我想也许我可以使用 diff() 然后 select 用于具有负值的名称。 (These names I can exclude from list) But I lose grouping column when I use this function: (我可以从列表中排除这些名称)但是当我使用这个 function 时,我丢失了分组列:

df.groupby(by='g')['v'].diff()
0    NaN
1    2.0
2    2.0
3    NaN
4    2.0
5   -1.0

What is the best way to do this?做这个的最好方式是什么?

The following looks good but is NOT want I want (Since it returns true even if value stays the same)以下看起来不错,但不是我想要的(因为即使值保持不变,它也会返回 true)

>>> df.groupby(by='g')['v'].is_monotonic_increasing.reset_index()
     g      v
0  ABC   True
1  DEF  False

You just need to check if is monotonic increasing and all the elements are unique.您只需要检查是否是单调递增的并且所有元素都是唯一的。 For this you could use pandas is_monotonic_increasing and unique :为此,您可以使用 pandas is_monotonic_increasingunique

res = df.groupby('g', as_index=False)['v'].apply(lambda x: len(x) == len(x.unique()) and x.is_monotonic_increasing)
print(res)

Output Output

g
ABC     True
DEF    False
Name: v, dtype: bool

As an alternative use duplicated to check if all the values are unique:作为替代使用duplicated来检查所有值是否都是唯一的:

res = df.groupby('g', as_index=False)['v'].apply(lambda x: (~x.duplicated()).all() and x.is_monotonic_increasing)
print(res)

Output Output

     g      v
0  ABC   True
1  DEF  False

A third alternative is to use numpy and verify all the differences between consecutives elements are greater than 0:第三种选择是使用 numpy 并验证连续元素之间的所有差异都大于 0:

res = df.groupby('g', as_index=False)['v'].apply(lambda x: np.all(np.diff(x) > 0))

Thank you Dani for answer.谢谢丹妮的回答。 I had to make a small change to make 'g' column appear.我必须做一个小改动才能使“g”列出现。

df.groupby('g', as_index=True)['v'].apply(lambda x: len(x) == len(x.unique()) and x.is_monotonic_increasing).reset_index()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM