python pandas字符串列时间滚动不重复计数

Question

As I want to count the unique number of column A in a moving time window(60 seconds): 由于我想计算移动时间窗口（60秒）中列A的唯一数量：

fn = lambda x: len(np.unique(x)) 
df = pd.DataFrame({'A':['a', 'b', 'a', 'b', 'e'], 'B': [0, 1, 2, 3, 4]},
                index = [pd.Timestamp('20130101 09:01:00'),
                         pd.Timestamp('20130101 09:01:32'),
                         pd.Timestamp('20130101 09:02:03'),
                         pd.Timestamp('20130101 09:02:25'),
                         pd.Timestamp('20130101 09:03:06')])


df[['A']].rolling('60s').apply(fn)

I expect the result as 我期望结果为

2013-01-01 09:01:00 1
2013-01-01 09:01:32 2
2013-01-01 09:02:03 2
2013-01-01 09:02:25 2
2013-01-01 09:03:06 2

however, the result is: 但是，结果是：

2013-01-01 09:01:00 a
2013-01-01 09:01:32 b
2013-01-01 09:02:03 a
2013-01-01 09:02:25 b
2013-01-01 09:03:06 e

what's the problem? 有什么问题？

Answer 1

You can use column B instead A : 您可以使用B列而不是A列：

a = df[['B']].rolling('60s').apply(fn)
print (a)
                       B
2013-01-01 09:01:00  1.0
2013-01-01 09:01:32  2.0
2013-01-01 09:02:03  2.0
2013-01-01 09:02:25  3.0
2013-01-01 09:03:06  2.0

And if need convert to int : 如果需要转换为int ：

a = df[['B']].rolling('60s').apply(fn).astype(int)
print (a)
                     B
2013-01-01 09:01:00  1
2013-01-01 09:01:32  2
2013-01-01 09:02:03  2
2013-01-01 09:02:25  3
2013-01-01 09:03:06  2

If no column you can create it: 如果没有列，则可以创建它：

a = df.assign(B=np.arange(len(df.index)))[['B']].rolling('60s').apply(fn).astype(int)
print (a)
                     B
2013-01-01 09:01:00  1
2013-01-01 09:01:32  2
2013-01-01 09:02:03  2
2013-01-01 09:02:25  3
2013-01-01 09:03:06  2

df['B'] = np.arange(len(df.index))
a = df[['B']].rolling('60s').apply(fn).astype(int)
print (a)
                     B
2013-01-01 09:01:00  1
2013-01-01 09:01:32  2
2013-01-01 09:02:03  2
2013-01-01 09:02:25  3
2013-01-01 09:03:06  2

EDIT1: EDIT1：

df['B'] = np.arange(len(df.index))
a = df.groupby('A')[['B']].rolling('60s').apply(fn).astype(int)
print (a)
                       B
A                       
a 2013-01-01 09:01:00  1
  2013-01-01 09:02:03  1
b 2013-01-01 09:01:32  1
  2013-01-01 09:02:25  2
e 2013-01-01 09:03:06  1

Answer 2

Simply try this way : 只需尝试这种方式：

In [40]: import pandas as pd

In [41]: fn = lambda x: len(np.unique(x)) 
    ...: df = pd.DataFrame({'A':['a', 'b', 'c', 'd', 'e'], 'B': [0, 1, 2, 3, 4]},
    ...:                 index = [pd.Timestamp('20130101 09:01:00'),
    ...:                          pd.Timestamp('20130101 09:01:32'),
    ...:                          pd.Timestamp('20130101 09:02:03'),
    ...:                          pd.Timestamp('20130101 09:02:25'),
    ...:                          pd.Timestamp('20130101 09:03:06')])

In [42]: df[['B']] = df[['B']].rolling('60s').apply(fn).astype(int)

In [43]: df[['']] = df[['B']]

In [44]: df[['']]
Out[44]: 

2013-01-01 09:01:00  1
2013-01-01 09:01:32  2
2013-01-01 09:02:03  2
2013-01-01 09:02:25  3
2013-01-01 09:03:06  2

In [45]:

python pandas字符串列时间滚动不重复计数

问题描述

2 个解决方案

解决方案1
1 已采纳 2017-08-16 07:14:17

解决方案2
-1 2017-08-16 07:15:22

python pandas字符串列时间滚动不重复计数

问题描述

2 个解决方案

解决方案1 1 已采纳 2017-08-16 07:14:17

解决方案2 -1 2017-08-16 07:15:22

解决方案1
1 已采纳 2017-08-16 07:14:17

解决方案2
-1 2017-08-16 07:15:22