[英]How to Count the Number of Distinct Values in a Data Frame Column with a Condition in R
I have a data frame that looks like this: 我有一个看起来像这样的数据框:
date timestamp transfer ID IP Address Username Encryption File Bytes Speed DateTimeStamp
1 20160525 08:22:06.838 F798256B 10.199.194.38:57708 wei2dt - "" 264 "1.62 seconds (1.30 kilobits/sec)" 20160525 08:22:06.838
2 20160525 08:28:26.920 F798256C 10.19.105.15:57708 wei2dt - "isi_audit_log.dmp-sv.tmp" 69 "0.29 seconds (1.93 kilobits/sec)" 20160525 08:28:26.920
3 20160525 08:28:26.923 F798256D 10.19.105.15:57708 wei2dt - "isi_audit_log.dmp-sv.met" 0 "Unable to stat isi_audit_log.dmp-sv.met: No such file or directory" 20160525 08:28:26.923
4 20160525 08:28:26.933 F798256E 10.19.105.15:57708 wei2dt - "CG0009 1364_GT_report.txt" 34 "0.01 seconds (34.0 kilobits/sec)" 20160525 08:28:26.933
I want to count the number of users (usernames) that were online at a certain time. 我想计算在特定时间在线的用户数(用户名)。 Essentially, I want to check every five minutes or so how many users were active.
本质上,我想每五分钟检查一次活动的用户数。 I need to use the DateTimestamp column to create my intervals and utilize it as a condition to count the number of distinct users at that period of time.
我需要使用DateTimestamp列来创建我的时间间隔,并以此为条件来计算该时间段内不同用户的数量。 I've tried using a while loop to do something of the sort, but it did not work.
我已经尝试过使用while循环来执行某种操作,但是它没有用。 Are there any suggestions on how I should go about this?
有什么建议我应该怎么做吗?
With dplyr
与
dplyr
df %>% mutate(timeInt=cut(DateTimeStamp,breaks="5 min")) %>%
group_by(timeInt) %>% summarise(numberUniqueUsers=length(unique(Username)))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.