简体   繁体   中英

R dplyr rowwise mutate

Good morning all, this is my first time posting on stack overflow. Thank you for any help!

I have 2 dataframes that I am using to analyze stock data. One data frame has dates among other information, we can call it df:

df1 <- tibble(Key = c('a','b','c'), i =11:13, date= ymd(20110101:20110103))

The second dataframe also has dates and other important information.

df2 <-tibble(Answer = c('a','d','e','b','f','c'), j =14:19, date= ymd(20150304:20150309))

Here is what I want to do: For each row in df1, I need to:

-Find the date in df2 where, for when df2$answer is the same as df1$key, it is the closest to that row's date in df1.

-Then extract information for another column in that row in df2, and put it in a new row in df1.

The code i tried:

df1 %>%
 group_by(Key, i) %>%
 mutate(
`New Column` =  df2$j[
  which.min(subset(df2$date, df2$Answer== Key) - date)])

This has the result:

Key       i date       `New Column`
1 a        11 2011-01-01           14
2 b        12 2011-01-02           14
3 c        13 2011-01-03           14

This is correct for the first row, a . In df2 , the closest date is 2015-03-04 , for which the value of j is in fact 14 .

However, for the second row, Key=b , I want df2 to subset to only look at dates for rows where df2$Answer = b . Therefore, the date should be 2015-03-07 , for which j =17 .

Thank you for your help!

Jesse

This should work:

library(dplyr)
df1 %>% 
  left_join(df2, by = c("Key" = "Answer")) %>% 
  mutate(date_diff = abs(difftime(date.x, date.y, units = "secs"))) %>% 
  group_by(Key) %>% 
  arrange(date_diff) %>% 
  slice(1) %>% 
  ungroup()

We are first joining the two data frames with left_join . Yes, I'm aware there are possibly multiple dates for each Key , bear with me.

Next, we calculate (with mutate ) the absolute value ( abs ) of the difference between the two dates date.x and date.y .

Now that we have this, we will group the data by Key using group_by . This will make sure that each distinct Key will be treated separately in subsequent calculations.

Since we've calculated the date_diff , we can now re-order ( arrange ) the data for each Key , with the smallest date_diff as first for each Key .

Finally, we are only interested in that first, smallest date_diff for each Key , so we can discard the rest using slice(1) .

This pipeline gives us the following:

  Key       i date.x         j date.y     date_diff
  <chr> <int> <date>     <int> <date>     <time>   
1 a        11 2011-01-01    14 2015-03-04 131587200
2 b        12 2011-01-02    17 2015-03-07 131760000
3 c        13 2011-01-03    19 2015-03-09 131846400

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM