简体   繁体   中英

Find minimum value since in R

I've been seeing a lot of news outlets talking about a "country X reporting its lowest number of new coronavirus cases since date Y", so I wanted to try to do this in R, but I just can't figure out how.

Here's the data I have for Italy, for example:

italy <- tibble::tribble(
   ~country,   ~date, ~cases_day,
  "Italy", "2020-03-16", 3233L,
  "Italy", "2020-03-17", 3526L,
  "Italy", "2020-03-18", 4207L,
  "Italy", "2020-03-19", 5322L,
  "Italy", "2020-03-20", 5986L,
  "Italy", "2020-03-21", 6557L,
  "Italy", "2020-03-22", 5560L,
  "Italy", "2020-03-23", 4789L,
  "Italy", "2020-03-24", 5249L,
  "Italy", "2020-03-25", 5210L,
  "Italy", "2020-03-26", 6203L,
  "Italy", "2020-03-27", 5909L,
  "Italy", "2020-03-28", 5974L,
  "Italy", "2020-03-29", 5217L,
  "Italy", "2020-03-30", 4050L,
  "Italy", "2020-03-31", 4053L,
  "Italy", "2020-04-01", 4782L,
  "Italy", "2020-04-02", 4668L,
  "Italy", "2020-04-03", 4585L,
  "Italy", "2020-04-04", 4805L,
  "Italy", "2020-04-05", 4316L,
  "Italy", "2020-04-06", 3599L,
  "Italy", "2020-04-07", 3039L,
  "Italy", "2020-04-08", 3836L,
  "Italy", "2020-04-09", 4204L,
  "Italy", "2020-04-10", 3951L,
  "Italy", "2020-04-11", 4694L,
  "Italy", "2020-04-12", 4092L,
  "Italy", "2020-04-13", 3153L,
  "Italy", "2020-04-14", 2972L
  )

I want to create a column that tells me when was the last time the number of cases was below the one in the current line. So the desired result for the first 10 rows would be something like:

tibble::tribble(
  ~country,        ~date, ~cases_day, ~minimum_since,
   "Italy", "2020-03-16",      3233L,             NA,
   "Italy", "2020-03-17",      3526L,   "2020-03-16",
   "Italy", "2020-03-18",      4207L,   "2020-03-17",
   "Italy", "2020-03-19",      5322L,   "2020-03-18",
   "Italy", "2020-03-20",      5986L,   "2020-03-19",
   "Italy", "2020-03-21",      6557L,   "2020-03-20",
   "Italy", "2020-03-22",      5560L,   "2020-03-19",
   "Italy", "2020-03-23",      4789L,   "2020-03-18",
   "Italy", "2020-03-24",      5249L,   "2020-03-23",
   "Italy", "2020-03-25",      5210L,   "2020-03-23"
  )

I guess this could be done using something like accumulate? But I'm just stuck here. Thanks in advance for any help!

1) Base R Given row number i the call wx(i, italy) returns the row number of the most recent lower value. which(...) finds the row numbers of the V3 values that are less than the current row's V3 value and tail gets the last one or NA if none. Use the value returned from wx to index into the dates. No packages are used.

nr <- nrow(italy)
wx <- function(i, data) {
  ix <- with(data, which(head(V3, i-1) < V3[i]))
  tail(c(NA, ix), 1)
}
transform(italy, min_since = V2[sapply(1:nr, wx, data = italy)])

giving:

      V1         V2   V3  min_since
1  Italy 2020-03-16 3233       <NA>
2  Italy 2020-03-17 3526 2020-03-16
3  Italy 2020-03-18 4207 2020-03-17
4  Italy 2020-03-19 5322 2020-03-18
5  Italy 2020-03-20 5986 2020-03-19
6  Italy 2020-03-21 6557 2020-03-20
7  Italy 2020-03-22 5560 2020-03-19
8  Italy 2020-03-23 4789 2020-03-18
9  Italy 2020-03-24 5249 2020-03-23
10 Italy 2020-03-25 5210 2020-03-23
11 Italy 2020-03-26 6203 2020-03-25
12 Italy 2020-03-27 5909 2020-03-25
13 Italy 2020-03-28 5974 2020-03-27
14 Italy 2020-03-29 5217 2020-03-25
15 Italy 2020-03-30 4050 2020-03-17
16 Italy 2020-03-31 4053 2020-03-30
17 Italy 2020-04-01 4782 2020-03-31
18 Italy 2020-04-02 4668 2020-03-31
19 Italy 2020-04-03 4585 2020-03-31
20 Italy 2020-04-04 4805 2020-04-03
21 Italy 2020-04-05 4316 2020-03-31
22 Italy 2020-04-06 3599 2020-03-17
23 Italy 2020-04-07 3039       <NA>
24 Italy 2020-04-08 3836 2020-04-07
25 Italy 2020-04-09 4204 2020-04-08
26 Italy 2020-04-10 3951 2020-04-08
27 Italy 2020-04-11 4694 2020-04-10
28 Italy 2020-04-12 4092 2020-04-10
29 Italy 2020-04-13 3153 2020-04-07
30 Italy 2020-04-14 2972       <NA>

2) sqldf We can also do it in SQL by joining each row of italy to every other row of italy that has lower V2 and V3 values. Then take the values from the joined rows having the largest V2.

library(sqldf)

sqldf("select a.*, b.V2 min_since, max(b.V2) prior_min
  from italy a
  left join italy b
  on a.V2 > b.V2 and a.V3 > b.V3
  group by a.V2")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM