I have this strange situation. I am simply trying to select some rows from data.table.
dput(DT)
structure(list(Date = structure(c(10959, 10960, 10961, 10962,
10963, 10966, 10967, 10968, 10969, 10970, 10974, 10975, 10976,
10977, 10980, 10981, 10982, 10983, 10984, 10987), class = "Date"),
A = c(51.502148, 47.567955, 44.61731, 42.918453, 46.494991,
49.311516, 48.640915, 47.657368, 48.372677, 48.909157, 51.144493,
50.071529, 48.730328, 49.177395, 48.998569, 48.417381, 48.864449,
48.953861, 48.685623, 47.344421), AA = c(96.840897, 97.561798,
103.329002, 101.598839, 101.406601, 101.214363, 100.397339,
99.820618, 97.802101, 96.120003, 93.717003, 93.813118, 88.093979,
90.400864, 88.045921, 86.748299, 85.450684, 84.489479, 83.287979,
83.432159), AAC = c(NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_), AACG = c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_)), row.names = c(NA, -20L), class = c("data.table",
"data.frame"), .internal.selfref = <pointer: 0x7fa9640148e0>, sorted = "Date")
StartDate <- as.Date("2000-01-05")
TestDates <- c(StartDate,
StartDate + duration(6, units = "day"),
StartDate + duration(2, units = "week"))
DT[Date %in% TestDates, ] # works well here.
The real data of "DT" has 20 million rows. Using this same block of codes, R reported:
Empty data.table (0 rows and 7347 cols)
Does anyone know how to pick rows using a vector, in a more reliable way?
I found the problem. In this line of code:
StartDate <- as.Date("2000-01-05")
I was trying to set the base date and then use the following codes to get different dates.
TestDates <- c(StartDate,
StartDate + duration(6, units = "day"),
StartDate + duration(2, units = "week"))
But using duration
is wrong. Instead, I need:
TestDates <- c(StartDate,
StartDate + days(6),
StartDate + weeks(2))
In my case, I need to get data from different years, for example, 2000-01-01 and 2020-01-01. Using periods like seconds
, minutes
, hours
, days
, months
, weeks
and years
work on human level and I do not need to worry about leap years. For example:
StartDate <- ymd("2020-01-01") # note, 2020 is leap year
StartDates + duration(1, units = "year")
>[1] "2020-12-31 06:00:00 UTC"
StartDates + years(1)
>[1] "2021-01-01"
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.