使用pandas + python（有条件）在滚动窗口中计算不同的字符串

Question

I want to calculate the number of distinct port numbers that exist between the current row and the 5 previous rows (sliding window) and this when the same address appears. 我想计算当前行和前5行（滑动窗口）之间存在的不同端口号的数量，并在出现相同地址时进行计算。 For instance, 例如，

If the input is (csv file): 如果输入是（csv文件）：

ID      PORT     ADDRESS
1        21       ad3 
2        22       ad1  
3        23       ad2
4        25       ad2 
5        25       ad1
6        22       ad1 
7        22       ad1
8        21       ad4

The output should be: 输出应为：

ID      PORT     ADDRESS      COUNT_DISC_PORT
1        21       ad3        -
2        22       ad1        -
3        23       ad2        - 
4        25       ad2        - 
5        25       ad1        - 
6        22       ad1        2 
7        23       ad1        3
8        21       ad4        1

I have read the documentation about the rolling function in pandas and I have tried combining group by and rolling with no success. 我已经阅读了有关pandas中滚动功能的文档，并且尝试将group by和rolls组合在一起没有成功。

I am using Python 3.7 and the pandas package 0.22. 我正在使用Python 3.7和pandas软件包0.22。 Any feedback would be appreciated. 对于任何反馈，我们都表示感谢。

Answer 1

for index, row in df.iterrows(): small_df = df[index - 5:index] df['uniques'][index] = len(small_df.unique())

这是我的简要介绍。

Answer 2

Ok , seems like you data inout is mismatch with the df your show to us 好的，看来您的数据输入与您向我们展示的df不匹配

df.groupby('ADDRESS').PORT.apply(lambda x : pd.Series(x).rolling(5,min_periods=1).apply(lambda y: len(set(y))))
Out[844]: 
0    1.0
1    1.0
2    1.0
3    2.0
4    2.0
5    2.0
Name: PORT, dtype: float64

使用pandas + python（有条件）在滚动窗口中计算不同的字符串

问题描述

2 个解决方案

解决方案1
0 2018-03-01 19:39:32

解决方案2
0 2018-03-01 19:54:46

使用pandas + python（有条件）在滚动窗口中计算不同的字符串

问题描述

2 个解决方案

解决方案1 0 2018-03-01 19:39:32

解决方案2 0 2018-03-01 19:54:46

解决方案1
0 2018-03-01 19:39:32

解决方案2
0 2018-03-01 19:54:46