根据R中的第一个和最后一个出现折叠观察行

Question

I have a dataset like this. 我有一个像这样的数据集。

ID        EQP_ID         DATE           ENTRY     EXIT
10        1232           10/01/2018     0058      NA
10        8123           10/01/2018     NA        0059
11        8231           10/02/2018     0063      NA
11        233            10/03/2018     0064      NA
11        2512           10/04/2018     NA        0099
11        2111           10/05/2018     NA        1000

I want to collapse the observations such that the earliest row I see with an 'ENTRY' for a given ID is combined with the latest row with an EXIT value, and I also get the EQP_ID associated with the exit record: 我想折叠观察值，以便将给定ID带有“ ENTRY”的最早行与具有EXIT值的最新行合并，并且我还获得与退出记录关联的EQP_ID：

ID       EQP_ID    ENTRY       EXIT
10       8123      0058        0059
11       2111      0063        1000

I'm fairly new to R and this was complicated enough that I couldn't think of a good way to do it without resorting to a loop, and performance is predictably not very good. 我对R还是很陌生，它非常复杂，以至于我想不出一个不求助于循环的好方法，而且性能也不是很好。

Edit 编辑

I think this does it, but I'd still be curious if other more experienced folks have a better answer 我认为可以，但我仍然想知道其他经验丰富的人是否有更好的答案

> group_by(dataset, ID) %>% 
  arrange(ENTRY) %>% 
  summarize(ENTRY = first(ENTRY), EXIT = last(exit), EQP_ID = last(EQP_ID))

Answer 1

One option with data.table: data.table的一种选择：

library(data.table)

#create example data
dt <- data.table(
    id = c(10, 10, 11, 11, 11, 11),
    date = seq(as.Date("2018-10-1"), as.Date("2018-10-6"), by="day"),
    entry = c(58, NA, 63, 64, NA, NA),
    exit = c(NA, 59, NA, NA, 99, 100)
)

# number rows by id
dt[order(id, date), num := 1:.N, by=id]

# get first-entry and last-exit values by id
dt[ , keepentry := entry[1],by=id]
dt[ , keepexit  := exit[.N],by=id]

# keep one row per id
dt[num==1, .(id, keepentry, keepexit)]

Not my most elegant work, but it will get the job done. 这不是我最出色的工作，但可以完成工作。

Answer 2

Using dplyr::first and dplyr::last we can do the below, another option we can use min and max 使用dplyr::first和dplyr::last我们可以执行以下操作，另一个可以使用min和max选项

library(dplyr)
df %>% group_by(ID) %>% 
       summarise(EQP_ID=dplyr::last(EQP_ID), First=dplyr::first(ENTRY),Last=dplyr::last(EXIT))


 # A tibble: 2 x 4
 ID EQP_ID First  Last
 <int>  <int> <int> <int>
1    10   8123    58    59
2    11   2111    63  1000

Answer 3

This solution uses dplyr . 此解决方案使用dplyr 。 First, define the data frame. 首先，定义数据框。

df <- read.table(text = "ID        EQP_ID         DATE           ENTRY     EXIT
10        1232           10/01/2018     0058      NA
10        8123           10/01/2018     NA        0059
11        8231           10/02/2018     0063      NA
11        233            10/03/2018     0064      NA
11        2512           10/04/2018     NA        0099
11        2111           10/05/2018     NA        1000", header = TRUE)

Next, group by ID and take either the first or last value of variables in the group using head or tail , respectively. 接下来，按ID分组，并分别使用head或tail来获取组中变量的第一个或最后一个值。

df %>% 
  group_by(ID) %>% 
  summarise(EQP_ID = tail(EQP_ID, 1),
            ENTRY = head(ENTRY, 1),
            EXIT = tail(EXIT, 1))

This gives, 这样，

# # A tibble: 2 x 4
#       ID EQP_ID ENTRY  EXIT
#    <int>  <int> <int> <int>
# 1    10   8123    58    59
# 2    11   2111    63  1000

根据R中的第一个和最后一个出现折叠观察行

问题描述

3 个解决方案

解决方案1
0 2018-08-22 23:36:58

解决方案2
0 已采纳 2018-08-22 23:41:59

解决方案3
0 2018-08-22 23:42:37

根据R中的第一个和最后一个出现折叠观察行

问题描述

3 个解决方案

解决方案1 0 2018-08-22 23:36:58

解决方案2 0 已采纳 2018-08-22 23:41:59

解决方案3 0 2018-08-22 23:42:37

解决方案1
0 2018-08-22 23:36:58

解决方案2
0 已采纳 2018-08-22 23:41:59

解决方案3
0 2018-08-22 23:42:37