简体   繁体   中英

How do I summarise a dataframe based on dates?

I use R 3.5.0 on Windows 10.

I have a dataframe which is imported using library(openxls) read.xls("....") . It has 100 000 rows and part of it looks like

S.No Start.DateEnd.Date  Generation    unitout     timediff
7850   42907.76 42907.77 436.158469    INSERVICE       15
7851   42907.77 42907.78 443.302793    INSERVICE       15
7852   42907.78 42907.79 437.728683    INSERVICE       15
7853   42907.79 42907.80 390.832887    INSERVICE       15
7854   42907.80 42907.81 338.917658    INSERVICE       15
7855   42907.81 42907.82 300.056018    INSERVICE       15
7856   42907.82 42907.83 266.430064    INSERVICE       15
7857   42907.83 42907.84 248.952525    INSERVICE       15
7858   42907.84 42907.85 212.913333    INSERVICE       15
7859   42907.85 42907.86  18.523060    INSERVICE       15
7860   42907.86 42907.88   1.355428 OUTOFSERVICE       15
7861   42907.88 42907.89   1.355428 OUTOFSERVICE       15
7862   42907.89 42907.90   1.355428 OUTOFSERVICE       15
7863   42907.90 42907.91   1.355428 OUTOFSERVICE       15
7864   42907.91 42907.92   1.355428 OUTOFSERVICE       15
7865   42907.92 42907.93   1.355428 OUTOFSERVICE       15
7866   42907.93 42907.94   1.355428 OUTOFSERVICE       15
7867   42907.94 42907.95   1.355428 OUTOFSERVICE       15
7868   42907.95 42907.96   1.355428 OUTOFSERVICE       15
7869   42907.96 42907.97   1.355428 OUTOFSERVICE       15
7870   42907.97 42907.98   1.355428 OUTOFSERVICE       15

I would like to summarise this to give me a dataframe of the form

1 DateTime1(42907.76) DateTime2(42907.86) INSERVICE      TIMEDIFF
2 DateTime2(42907.86) DateTime3(42907.98) OUTOFSERVICE   TIMEDIFF
3 DateTime3(42907.98) DateTime4(...)      INSERVICE      TIMEDIFF

where every time the status changes from INSERVICE to OUTOFSERVICE it captures the start date and end date. Basically I want to know from which date+time to which date+time it was in service and out of service summarised in a data frame. In the above example DateTime1 would be 42907.76 and DateTime2 would be 42907.86 since after that it goes out of service. Similarly DateTime2 would be 42907.86 to 42907.98.So on..

I have tried creating a flag to solve it but I wasn't able to create the data frame so I did not attach the code here. My preference would be to use an easy to understand solution with good logic than using packages which do everything in the backend.

PS An additional problem is conversion of Excel Time format to standard %Y%m%D%H%M format. I have read multiple threads on SO and I have tried doing as.posixCT, as.date etc. but either it changes to NA or throws an error.

using dplyr

We create a lag of unitout and use this to create an ID on which we can group afterwards

library(dplyr)
df$id <- cumsum(as.integer(df$unitout != lag(df$unitout, n = 1, default=1))) 
df %>% group_by(id, unitout) %>% summarise("Start" = min(Start.Date), "End" = max(End.Date))

You can convert your dates the following way:

  • Windows Excel: as.Date(42907.76, origin = "1899-12-30")
  • Mac Excel: as.Date(42907.76, origin = "1904-01-01")

data :

df <- read_table(
"S.No Start.Date  End.Date  Generation  unitout       timediff
7850   42907.76   42907.77  436.158469  INSERVICE     15
7851   42907.77   42907.78  443.302793  INSERVICE     15
7852   42907.78   42907.79  437.728683  INSERVICE     15
7853   42907.79   42907.80  390.832887  INSERVICE     15
7854   42907.80   42907.81  338.917658  INSERVICE     15
7855   42907.81   42907.82  300.056018  INSERVICE     15
7856   42907.82   42907.83  266.430064  INSERVICE     15
7857   42907.83   42907.84  248.952525  INSERVICE     15
7858   42907.84   42907.85  212.913333  INSERVICE     15
7859   42907.85   42907.86  18.523060   INSERVICE     15
7860   42907.86   42907.88  1.355428    OUTOFSERVICE  15
7861   42907.88   42907.89  1.355428    OUTOFSERVICE  15
7862   42907.89   42907.90  1.355428    OUTOFSERVICE  15
7863   42907.90   42907.91  1.355428    OUTOFSERVICE  15
7864   42907.91   42907.92  1.355428    OUTOFSERVICE  15
7865   42907.92   42907.93  1.355428    OUTOFSERVICE  15
7866   42907.93   42907.94  1.355428    OUTOFSERVICE  15
7867   42907.94   42907.95  1.355428    OUTOFSERVICE  15
7868   42907.95   42907.96  1.355428    OUTOFSERVICE  15
7869   42907.96   42907.97  1.355428    OUTOFSERVICE  15
7870   42907.97   42907.98  1.355428    OUTOFSERVICE  15")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM