[英]Concatenating data frame rows based on column condition
For subsequent discussion, I will refer to the example data frame below:对于后续的讨论,我将参考下面的示例数据框:
Now, what I wish to achieve is to group all the packet times that are similar - ie all the 7s, 12s, etc. Furthermore, the PacketTime
field should contain the difference in min and max ( max(PacketTime) - min(PacketTime)
), and the FrameLen
, IPLen
and TCPLen
fields should be lists of all the values that correspond to the grouped time.现在,我希望实现的是对所有相似的数据包时间进行分组 - 即所有 7s、12s 等。此外,
PacketTime
字段应包含 min 和 max 的差异( max(PacketTime) - min(PacketTime)
),并且FrameLen
、 IPLen
和TCPLen
字段应该是对应于分组时间的所有值的列表。 For example for the 7s group, FrameLen
would contain c(304, 276, 276)
.例如对于 7s 组,
FrameLen
将包含c(304, 276, 276)
。
My solution for the above is as follows:我对上述问题的解决方案如下:
df <- packets %>%
group_by(round(PacketTime)) %>%
summarise(
PTime=max(PacketTime)-min(PacketTime),
FLen=list(FrameLen),
ILen=list(IPLen),
Movement=0
) %>%
rename(PacketTime=PTime) %>%
rename(FrameLen=FLen) %>%
rename(IPLen=ILen)
df$"round(PacketTime)" <- NULL # Remove the group_by
However, some of these crossover (ie 1480s also includes part of 1481s).但是,其中一些分频器(即 1480s 还包括 1481s 的一部分)。 The part here, which makes this a little easier (in some regard) is that each of the groups are separated by 5s timing window (via Python
time.sleep(5)
).这里的部分使这更容易(在某些方面)是每个组都由 5s 时间 window (通过 Python time.sleep
time.sleep(5)
)分隔。
How can I achieve the previous result, but only relying on the 5s difference between the groups that also takes into account the crossover ?我怎样才能达到以前的结果,但只依靠组之间的 5s 差异也考虑到交叉?
EDIT: As suggested by Ben, here is the dput()
of my dataframe df[1:20,]
:编辑:正如 Ben 所建议的,这是我的 dataframe
df[1:20,]
的dput()
:
structure(list(PacketTime = c(7.083779, 7.147268, 7.147462, 12.084768,
12.153246, 12.153951, 17.095972, 17.159268, 17.159876, 22.11384,
22.176926, 22.177467, 27.134427, 27.199108, 27.200064, 32.144442,
32.208648, 32.20922, 37.144255, 37.205622), FrameLen = c(304L,
276L, 276L, 304L, 276L, 276L, 304L, 276L, 276L, 304L, 276L, 276L,
304L, 276L, 276L, 304L, 276L, 276L, 304L, 276L), IPLen = c(300L,
272L, 272L, 300L, 272L, 272L, 300L, 272L, 272L, 300L, 272L, 272L,
300L, 272L, 272L, 300L, 272L, 272L, 300L, 272L), TCPLen = c(260L,
232L, 232L, 260L, 232L, 232L, 260L, 232L, 232L, 260L, 232L, 232L,
260L, 232L, 232L, 260L, 232L, 232L, 260L, 232L), Movement = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), row.names = c(NA,
20L), class = "data.frame")
Here is a base R solution using aggregate
+ transform
这是使用
aggregate
+ transform
的基本 R 解决方案
u <- aggregate(
. ~ PacketTime,
transform(df,
PTime = ave(PacketTime, trunc(PacketTime),
FUN = function(x) diff(range(x))), PacketTime = trunc(PacketTime)
),
c
)
dfout <- transform(u, PTime = sapply(PTime, unique))
which gives这使
> dfout
PacketTime FrameLen IPLen TCPLen Movement PTime
1 7 304, 276, 276 300, 272, 272 260, 232, 232 0, 0, 0 0.063683
2 12 304, 276, 276 300, 272, 272 260, 232, 232 0, 0, 0 0.069183
3 17 304, 276, 276 300, 272, 272 260, 232, 232 0, 0, 0 0.063904
4 22 304, 276, 276 300, 272, 272 260, 232, 232 0, 0, 0 0.063627
5 27 304, 276, 276 300, 272, 272 260, 232, 232 0, 0, 0 0.065637
6 32 304, 276, 276 300, 272, 272 260, 232, 232 0, 0, 0 0.064778
7 37 304, 276 300, 272 260, 232 0, 0 0.061367
One approach is to use seq
and cut
.一种方法是使用
seq
和cut
。 Create a sequence from your minimum to maximum times, every 5 seconds.每 5 秒创建一个从最小到最大时间的序列。 Then, use
cut
to put your times in intervals.然后,使用
cut
将您的时间间隔。 You can use the interval for the labels, for example: (7-12 sec) by omitting the labels
argument.您可以使用标签的间隔,例如:(7-12 秒)通过省略
labels
参数。 Or just use the lower time of the interval (7 sec) as done below.或者只是使用间隔的较低时间(7 秒),如下所示。
library(tidyverse)
my_breaks <- seq(trunc(min(packets$PacketTime)), max(packets$PacketTime) + 5, 5)
packets$Interval <- cut(packets$PacketTime, breaks = my_breaks, labels = my_breaks[-length(my_breaks)], right = FALSE)
packets %>%
group_by(Interval) %>%
summarise(
PTime=max(PacketTime)-min(PacketTime),
FLen=list(FrameLen),
ILen=list(IPLen),
Movement=0
) %>%
rename(PacketTime=PTime) %>%
rename(FrameLen=FLen) %>%
rename(IPLen=ILen)
Output Output
# A tibble: 7 x 5
Interval PacketTime FrameLen IPLen Movement
<fct> <dbl> <list> <list> <dbl>
1 7 0.0637 <int [3]> <int [3]> 0
2 12 0.0692 <int [3]> <int [3]> 0
3 17 0.0639 <int [3]> <int [3]> 0
4 22 0.0636 <int [3]> <int [3]> 0
5 27 0.0656 <int [3]> <int [3]> 0
6 32 0.0648 <int [3]> <int [3]> 0
7 37 0.0614 <int [2]> <int [2]> 0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.