简体   繁体   English

使用pandas计算滚动窗口中的不同字符串

[英]Count distinct strings in rolling window using pandas

How do I count the number of unique strings in a rolling window of a pandas dataframe? 如何计算pandas数据框滚动窗口中唯一字符串的数量?

a = pd.DataFrame(['a','b','a','a','b','c','d','e','e','e','e'])
a.rolling(3).apply(lambda x: len(np.unique(x)))

Output, same as original dataframe: 输出,与原始数据帧相同:

    0
0   a
1   b
2   a
3   a
4   b
5   c
6   d
7   e
8   e
9   e
10  e

Expected: 预期:

    0
0   1
1   2
2   2
3   2
4   2
5   3
6   3
7   3
8   2
9   1
10  1

I think you need first convert values to numeric - by factorize or by rank . 我认为你需要先将值转换为数字 - 通过factorize或按rank Also min_periods parameter is necessary for avoid NaN in start of column: 此外, min_periods参数对于在列的开头避免NaN是必需的:

a[0] = pd.factorize(a[0])[0]
print (a)
    0
0   0
1   1
2   0
3   0
4   1
5   2
6   3
7   4
8   4
9   4
10  4

b = a.rolling(3, min_periods=1).apply(lambda x: len(np.unique(x))).astype(int)
print (b)
    0
0   1
1   2
2   2
3   2
4   2
5   3
6   3
7   3
8   2
9   1
10  1

Or: 要么:

a[0] = a[0].rank(method='dense')
      0
0   1.0
1   2.0
2   1.0
3   1.0
4   2.0
5   3.0
6   4.0
7   5.0
8   5.0
9   5.0
10  5.0

b = a.rolling(3, min_periods=1).apply(lambda x: len(np.unique(x))).astype(int)
print (b)
    0
0   1
1   2
2   2
3   2
4   2
5   3
6   3
7   3
8   2
9   1
10  1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM