如何获取字符串中两个已定义字符之间的数据？ r

Question

我已经看到这里用其他语言回答了这个问题，但在 r 中找不到解决方案：

我有一个数据集，其中交互顺序至关重要，并且根据实验的进展情况，设备可以具有两种状态之一。 不过，硬件不会记录当前状态，因此分离状态的唯一方法是过滤“开始”和“停止”交互之间的数据。 状态 1 在“开始”-“停止”之外，状态 2 是“开始”和“停止”之间的一切。

我的数据格式如下：

Time       Individual    Interaction
11:57:31   XX002         2
12:00:00   XX123         Start
12:00:03   XX123         1
12:00:37   XX334         2
12:01:00   NA            Stop
12:04:12   XX441         2

如何过滤数据以获得两个单独的 dfs，一个用于 'start'-'stop' 之外的所有事件，另一个用于 'start' 和 'stop' 之间的所有事件？ 理想情况下，它会导致按时间顺序搜索“开始”交互的数据，然后过滤掉该数据和下一个“停止”之间的所有数据，并重复（因为有时在下一停止之前可能有多个“开始”交互。

在这个例子中，它会导致：

Time       Individual    Interaction
11:57:31   XX002         2
12:04:12   XX441         2

和

Time       Individual    Interaction
12:00:00   XX123         Start
12:00:03   XX123         1
12:00:37   XX334         2
12:01:00   NA            Stop

提前致谢

Answer 1

使用cumsum我们累积Start和Stop的变化。 将两者相减，我们在start/stop之间得到1 ，在退出时得到0 。 可悲的是，我们需要使用lag()把值stop也在dfin ，因为它也有一个0。

z = cumsum(df$Interaction=="Start")-cumsum(df$Interaction=="Stop")
sep = ifelse(z==0 & lag(z,default=z[1])==1,1,z)
dfoin=df[sep==1,]
dfout=df[sep==0,]

> dfout
      Time Individual Interaction
3 12:00:00      XX123       Start
4 12:00:03      XX123           1
5 12:00:37      XX334           2
6 12:01:00       <NA>        Stop
> dfin
      Time Individual Interaction
2 11:57:31      XX002           2
7 12:04:12      XX441           2

使用dplyr管道

df2=df%>%mutate(n=cumsum(Interaction=="Start")-cumsum(Interaction=="Stop"))%>%
  mutate(n=ifelse(n==0 & lag(z,default=z[1])==1,1,z))%>%split(.$n) 
> df2
$`0`
      Time Individual Interaction n
1 11:57:31      XX002           2 0
6 12:04:12      XX441           2 0

$`1`
      Time Individual Interaction n
2 12:00:00      XX123       Start 1
3 12:00:03      XX123           1 1
4 12:00:37      XX334           2 1
5 12:01:00       <NA>        Stop 1

Answer 2

您可以尝试查找开始和停止交互的时间，然后基于此对数据框进行子集化：

time_start <- df$Time[df$Interaction == "Start"]
time_stop  <- df$Time[df$Interaction == "Stop"]

df_in <- df[df$Time >= time_start & df$Time <= time_stop,]
df_out <- df[df$Time < time_start | df$Time > time_stop,]

df_in
      Time Individual Interaction
2 12:00:00      XX123       Start
3 12:00:03      XX123           1
4 12:00:37      XX334           2
5 12:01:00       <NA>        Stop

df_out
      Time Individual Interaction
1 11:57:31      XX002           2
6 12:04:12      XX441           2

如何获取字符串中两个已定义字符之间的数据？ r

问题描述

2 个解决方案

解决方案1
1 已采纳 2019-09-02 13:53:19

解决方案2
0 2019-09-02 13:45:24

如何获取字符串中两个已定义字符之间的数据？ r

问题描述

2 个解决方案

解决方案1 1 已采纳 2019-09-02 13:53:19

解决方案2 0 2019-09-02 13:45:24

解决方案1
1 已采纳 2019-09-02 13:53:19

解决方案2
0 2019-09-02 13:45:24