简体   繁体   中英

Finding Interval in data.table

I have a data.table A:

A = data.table(firmID = c("89031", "89031", "89031", "89031", "89031", "89031"), date = c("20170403","20170404", "20170405", "20170406", "20170407", "20170408"), TICKER = c("TSAR", "TSAR", "TSAR", "AAXN", "AAXN", "AAXN" ) )

Here it contains column "firmID", "date", "TICKER".

Now, firmID tracks the same company even their TICKER changed. Here we see TICKER changes from AAXN to TSAR starting from 20170426.

I want to make a table B like this:

B = data.table(firmID = c("89031", "89031"), STARTdate = c("20170403", "20170406"), ENDdate = c("20170405", "20170408"), TICKER = c("TSAR", "AAXN"))

So that it has a time interval when the TICKER is called respectively. This table can then be used to merge with another dataset which does not have firmID using foverlaps() function.

Any idea how one can do this without writing a for loop and compare each record to the previous one etc?

Thanks.

Given your A data.table, getting minimum and maximum of date would give each STARTdate and ENDdate .

If you group( by ) both TICKER and firmID , you might get the result you want.

library(data.table)
A[,
  .(STARTdate = min(date), ENDdate = max(date)), # range of date
  by = .(TICKER, firmID)]

#>    TICKER firmID STARTdate  ENDdate
#> 1:   TSAR  89031  20170403 20170405
#> 2:   AAXN  89031  20170406 20170408

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM