简体   繁体   English

在数据框中找到符合动态条件的第一行

[英]Find the first rows in a data frame which meet a dynamic condition

Here's some sample code: 这是一些示例代码:

library(quantmod)
library(dplyr)


stock.prices <- getSymbols(Symbols = 'AAPL', from = '2017-08-08', to = '2017-08-17', env = NULL)[,c(2,4)]
stock.dividends <- getDividends(Symbol = 'AAPL', from = '2017-08-08', to = '2017-08-17')

summary <- merge(stock.prices, stock.dividends)
summary <- data.frame(date=index(summary), coredata(summary))
summary <- mutate(summary, buy.price = ifelse(is.na(AAPL.div), NA, lag(AAPL.Close, 1)))
summary

It produces this data: 它产生以下数据:

        date AAPL.High AAPL.Close AAPL.div lag.buy.price
1 2017-08-08    161.83     160.08       NA            NA
2 2017-08-09    161.27     161.06       NA            NA
3 2017-08-10    160.00     155.32     0.63        161.06
4 2017-08-11    158.57     157.48       NA            NA
5 2017-08-14    160.21     159.85       NA            NA
6 2017-08-15    162.20     161.60       NA            NA
7 2017-08-16    162.51     160.95       NA            NA

I would like to append a column like so: 我想像这样追加一列:

        date AAPL.High AAPL.Close AAPL.div lag.buy.price    sell.date
1 2017-08-08    161.83     160.08       NA            NA           NA
2 2017-08-09    161.27     161.06       NA            NA           NA
3 2017-08-10    160.00     155.32     0.63        161.06   2017-08-15
4 2017-08-11    158.57     157.48       NA            NA           NA
5 2017-08-14    160.21     159.85       NA            NA           NA
6 2017-08-15    162.20     161.60       NA            NA           NA
7 2017-08-16    162.51     160.95       NA            NA           NA

This finds the first date that I can sell to break even...I buy stock on 2017-08-09 to be eligible for the dividend the following day. 这找到了我可以卖出以达到收支平衡的第一个日期...我在2017-08-09购买股票以有资格在第二天获得股息。 I pay 161.06 per share. 我支付每股161.06。 Having received the dividend, I'd now like to sell at >= 161.06. 收到股息后,我现在想以> = 161.06的价格出售。 2017-08-15 is the first day that I can do this. 2017年8月15日是我可以做到的第一天。

I can run a for-loop to achieve this but it seems rather crude and inefficient. 我可以运行一个for循环来实现这一点,但是它看起来相当粗糙且效率低下。

Is there a way to produce the 'sell.date' column using dplyr? 有没有一种方法可以使用dplyr生成“ sell.date”列?

This should get you there: 这应该使您到达那里:

library(quantmod)
library(tidyverse)


stock.prices <- getSymbols(Symbols = 'AAPL', from = '2017-08-08', to = '2017-08-17', env = NULL)[,c(2,4)]
stock.dividends <- getDividends(Symbol = 'AAPL', from = '2017-08-08', to = '2017-08-17')

summary <- merge(stock.prices, stock.dividends) %>% 
  as_tibble() %>% 
  rownames_to_column('date') %>% 
  coredata() %>% 
  mutate(buy.price = ifelse(is.na(AAPL.div), NA, lag(AAPL.Close, 1)))

new_summary <- summary %>% 
  rownames_to_column() %>%
  mutate(rowname = as.numeric(rowname),
         sell.date = map2_chr(rowname, buy.price, function(row, buy){
           if(is.na(row) | is.na(buy)){
             NA
          }else{
            data <- summary %>% 
              mutate(lt_buy = AAPL.High >= buy) %>% 
              filter(lt_buy == T, rowname > row) 

            min(data$date)
          }
        }))

First, you need to append the row numbers to the data frame. 首先,您需要将行号附加到数据框。 Then, you should use purrr::map to iterate over the data (I changed your library(dplyr) to library(tidyverse) to get purrr ). 然后,您应该使用purrr::map遍历数据(我将您的library(dplyr)更改为library(tidyverse)以获得purrr )。 purrr::map2 takes two vector inputs (in this case two columns of your data.frame -- which I took the liberty to switching to a tibble ) and runs a function over those inputs. purrr::map2接受两个向量输入(在本例中为data.frame两列,我data.frame切换为tibble )并在这些输入上运行函数。 The anonymous function I wrote there filters your summary tibble for dates beyond the input date and prices that are higher than the buy price. 我在此处编写的匿名函数过滤输入的日期以外的日期和价格高于购买价格的摘要tibble It then returns the minimum date meeting that criteria. 然后,它返回满足该条件的最短日期。

I also made some changes to your data setup so that it uses a pipe chain and a more tidy type of structure. 我还对您的数据设置进行了一些更改,以使其使用管道链和更tidy的结构类型。

Hope this helps! 希望这可以帮助!

df[is.na(df$AAPL.div),'AAPL.div'] <- 0

sell.date <- 
with(df, {
  bought <- date > as.Date('2017-08-09')
  date[which.max(bought & (AAPL.Close + cumsum(AAPL.div*bought)) > 161.06)]})
sell.date     
#[1] "2017-08-15"

To add this as a column 将此添加为列

df$sell.date <- ifelse(is.na(df$lag.buy.price), NA, sell.date)

df
#          date AAPL.High AAPL.Close AAPL.div lag.buy.price  sell.date
# 1: 2017-08-08    161.83     160.08     0.00            NA       <NA>
# 2: 2017-08-09    161.27     161.06     0.00            NA       <NA>
# 3: 2017-08-10    160.00     155.32     0.63        161.06 2017-08-15
# 4: 2017-08-11    158.57     157.48     0.00            NA       <NA>
# 5: 2017-08-14    160.21     159.85     0.00            NA       <NA>
# 6: 2017-08-15    162.20     161.60     0.00            NA       <NA>
# 7: 2017-08-16    162.51     160.95     0.00            NA       <NA>

data used 使用的数据

library(data.table)
df <- fread("
a        date AAPL.High AAPL.Close AAPL.div lag.buy.price
1 2017-08-08    161.83     160.08       NA            NA
2 2017-08-09    161.27     161.06       NA            NA
3 2017-08-10    160.00     155.32     0.63        161.06
4 2017-08-11    158.57     157.48       NA            NA
5 2017-08-14    160.21     159.85       NA            NA
6 2017-08-15    162.20     161.60       NA            NA
7 2017-08-16    162.51     160.95       NA            NA
")[, -1]

this solution is not entirely without a for loop, but i guess you meant a loop to compare each value (that part is vectorized here). 此解决方案并非完全没有for循环,但我想您的意思是要比较每个值的循环(此处已向量化)。 Just in case you have more than one dividend that you observe this loop will be needed: 万一您观察到不止一个红利,将需要此循环:

summary$sell.date<-as.Date(rep(NA,7))


for(i in 1:length(which(!is.na(summary$buy.price))))
summary$sell.date[which(!is.na(summary$buy.price))[i]]<- summary[c(rep(FALSE,which(!is.na(summary$buy.price))[i]-1),(summary[which(!is.na(summary$buy.price))[i]:nrow(summary),"AAPL.High"]>summary[!is.na(summary$buy.price),"buy.price"][i])),"date"][1]

it produces the following result: 它产生以下结果:

     date AAPL.High AAPL.Close AAPL.div buy.price  sell.date
1 2017-08-08    161.83     160.08       NA        NA       <NA>
2 2017-08-09    161.27     161.06       NA        NA       <NA>
3 2017-08-10    160.00     155.32     0.63    161.06 2017-08-15
4 2017-08-11    158.57     157.48       NA        NA       <NA>
5 2017-08-14    160.21     159.85       NA        NA       <NA>
6 2017-08-15    162.20     161.60       NA        NA       <NA>
7 2017-08-16    162.51     160.95       NA        NA       <NA>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM