简体   繁体   中英

Count unique values over rolling window which meet a condition

I have data resembling the following:

df = pd.DataFrame({
    'cat': ['a','a','b','c','a','a','c','b', 'b'],
    'cond': [True, True, False, True, False, True, True, True, True]
})

I'd like to create a new column which counts the number of unique occurrences of cat over a rolling window, where all occurrences of cat are True per cond.

So output for above df with rolling(window=3) would be:

df['manual_count'] = pd.Series([np.nan,np.nan,1.0,2.0,1.0,1.0,1.0,3.0,2.0])

I've only got as far as counting unique occurrences without the condition, which is fairly straightforward:

df['all'] = (
    pd.Series(df['cat'].factorize()[0])
    .rolling(3)
    .apply(lambda x: x.nunique())
)

But introducing the condition has me stumped. Am thinking the answer lies with groupby/apply but can't quite seem to put them together as needed...appreciate any help!

[EDIT] Final solution using Myrl's excellent idea:

df['false_once'] = (
    pd.Series(df['cat'].factorize()[0])
    .where(~df['cond'], -1)
    .rolling(3)
    .apply(lambda x: x[x>=0].nunique())
)
df['true_all'] = df['all'] - df['false_once']

How about filtering the column according to df["cond"] and replacing the elements that do not satisfy the criterion with some marker like -1 ? Since pd.factorize always returns nonnegative integers, you can clear the negative values before counting unique elements. Here's a quick one-liner to convey the idea:

pd.Series(df['cat'].factorize()[0])
  .where(df['cond'], -1).rolling(3)
  .apply(lambda x: x[x>0].nunique())

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM