[英]Count distinct strings in rolling window using pandas + python (with a condition)
I want to calculate the number of distinct port numbers that exist between the current row and the 5 previous rows (sliding window) and this when the same address appears. 我想计算当前行和前5行(滑动窗口)之间存在的不同端口号的数量,并在出现相同地址时进行计算。 For instance, 例如,
If the input is (csv file): 如果输入是(csv文件):
ID PORT ADDRESS
1 21 ad3
2 22 ad1
3 23 ad2
4 25 ad2
5 25 ad1
6 22 ad1
7 22 ad1
8 21 ad4
The output should be: 输出应为:
ID PORT ADDRESS COUNT_DISC_PORT
1 21 ad3 -
2 22 ad1 -
3 23 ad2 -
4 25 ad2 -
5 25 ad1 -
6 22 ad1 2
7 23 ad1 3
8 21 ad4 1
I have read the documentation about the rolling function in pandas and I have tried combining group by and rolling with no success. 我已经阅读了有关pandas中滚动功能的文档,并且尝试将group by和rolls组合在一起没有成功。
I am using Python 3.7 and the pandas package 0.22. 我正在使用Python 3.7和pandas软件包0.22。 Any feedback would be appreciated. 对于任何反馈,我们都表示感谢。
for index, row in df.iterrows(): small_df = df[index - 5:index] df['uniques'][index] = len(small_df.unique())
这是我的简要介绍。
Ok , seems like you data inout is mismatch with the df your show to us 好的,看来您的数据输入与您向我们展示的df不匹配
df.groupby('ADDRESS').PORT.apply(lambda x : pd.Series(x).rolling(5,min_periods=1).apply(lambda y: len(set(y))))
Out[844]:
0 1.0
1 1.0
2 1.0
3 2.0
4 2.0
5 2.0
Name: PORT, dtype: float64
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.