简体   繁体   English

使用pandas + python(有条件)在滚动窗口中计算不同的字符串

[英]Count distinct strings in rolling window using pandas + python (with a condition)

I want to calculate the number of distinct port numbers that exist between the current row and the 5 previous rows (sliding window) and this when the same address appears. 我想计算当前行和前5行(滑动窗口)之间存在的不同端口号的数量,并在出现相同地址时进行计算。 For instance, 例如,

If the input is (csv file): 如果输入是(csv文件):

ID      PORT     ADDRESS
1        21       ad3 
2        22       ad1  
3        23       ad2
4        25       ad2 
5        25       ad1
6        22       ad1 
7        22       ad1
8        21       ad4

The output should be: 输出应为:

ID      PORT     ADDRESS      COUNT_DISC_PORT
1        21       ad3        -
2        22       ad1        -
3        23       ad2        - 
4        25       ad2        - 
5        25       ad1        - 
6        22       ad1        2 
7        23       ad1        3
8        21       ad4        1 

I have read the documentation about the rolling function in pandas and I have tried combining group by and rolling with no success. 我已经阅读了有关pandas中滚动功能的文档,并且尝试将group by和rolls组合在一起没有成功。

I am using Python 3.7 and the pandas package 0.22. 我正在使用Python 3.7和pandas软件包0.22。 Any feedback would be appreciated. 对于任何反馈,我们都表示感谢。

for index, row in df.iterrows(): small_df = df[index - 5:index] df['uniques'][index] = len(small_df.unique())

这是我的简要介绍。

Ok , seems like you data inout is mismatch with the df your show to us 好的,看来您的数据输入与您向我们展示的df不匹配

df.groupby('ADDRESS').PORT.apply(lambda x : pd.Series(x).rolling(5,min_periods=1).apply(lambda y: len(set(y))))
Out[844]: 
0    1.0
1    1.0
2    1.0
3    2.0
4    2.0
5    2.0
Name: PORT, dtype: float64

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM