简体   繁体   中英

R Create new columns with ifelse-function for multiple dataframes

I want to create several columns with a ifelse()-condition for multiple dataframes. In this case the dataframes are 3 time-series data for cryptocurrencies. Here is the code to download the 3 dataframes automatically:

library(tidyverse)
library(crypto)

crypto_chart <- crypto_prices()%>% select(-id, -symbol,-price_btc, -`24h_volume_usd`,-available_supply, -total_supply,-max_supply, -percent_change_1h, -percent_change_24h, -percent_change_7d, -last_updated)%>% slice(1:3)

list_cryptocurrencies <-crypto_chart$name   

map(list_cryptocurrencies,
    function(x) crypto_history(x, start_date = '20150101', end_date = '20190303')%>%
      select(-slug, -symbol, -name, -`ranknow`))%>%
set_names(list_cryptocurrencies)%>%
list2env(envir = .GlobalEnv)

##Calculating return
map(mget(list_cryptocurrencies),
function(x) x %>% mutate(`return` =   (close-open)/open * 100))%>%
list2env(mget(list_cryptocurrencies), envir = .GlobalEnv)

Now I want to detect positive overreactions (oR_pos) in the returns. I define an overreaction as a value (return) higher than the mean + 1 standard deviation. I want do this also for 1.5 and 2 standard deviations. Here ist my desired output for one cryptocurrencie (Bitcoin):

> Bitcoin
     date    open   close      return     oR_pos>1sd oR_pos>1.5sd oR_pos>2sd
1  2018-01-01 14112.2 13657.2  -3.2241607         NA           NA         NA
2  2018-01-02 13625.0 14982.1   9.9603670   9.960367     9.960367   9.960367
3  2018-01-03 14978.2 15201.0   1.4874952         NA           NA         NA
4  2018-01-04 15270.7 15599.2   2.1511784         NA           NA         NA
5  2018-01-05 15477.2 17429.5  12.6140387  12.614039    12.614039  12.614039
6  2018-01-06 17462.1 17527.0   0.3716621         NA           NA         NA
7  2018-01-07 17527.3 16477.6  -5.9889430         NA           NA         NA
8  2018-01-08 16476.2 15170.1  -7.9271919         NA           NA         NA
9  2018-01-09 15123.7 14595.4  -3.4931928         NA           NA         NA
10 2018-01-10 14588.5 14973.3   2.6376941         NA           NA         NA
11 2018-01-11 14968.2 13405.8 -10.4381288         NA           NA         NA
12 2018-01-12 13453.9 13980.6   3.9148500   3.914850           NA         NA

Now I have 3 new columns with overreactions(oR_pos) which are > 1sd; 1.5sd and 2sd.

I've already tried this code:

oR_pos_function <- function(y) {
n <- seq(1, 2, 0.5)
y[paste0("oR_pos>", n, "sd")] <-lapply(n, function(x)
ifelse(x$return > mean(x$return)+ sd(x$return),x$return, NA))
y
}

map(mget(list_cryptocurrencies), oR_pos_function)%>%
set_names(list_cryptocurrencies)%>%
list2env(envir = .GlobalEnv)

But it doesen't works. Can someone help me?

The following closely matches your intended function, adding the desired columns onto your crypto, while allowing the desired sd thresholds to be passed-in as parameter for flexibility. Aside note, the solution below uses > as per OP, but you may wish to consider movement +/- direction from sd. Using solution below could be done using instead:

col <- ifelse(returns > (r_mean+(r_sd*threshold)) | 
              returns < (r_mean-(r_sd*threshold)),
              returns,NA)

Solution as follows:

oR_pos_function <- function(returns,thresholds) {

  r_mean <- mean(returns,na.rm=T)
  r_sd <- sd(returns,na.rm=T)

  cols <- lapply(thresholds,function(threshold) {
    col <- ifelse(returns > (r_mean+(r_sd*threshold)),returns,NA)
    return(col)
  })
  cols <- as.data.frame(cols)
  names(cols) <- paste0("oR_pos>",thresholds,"sd")
  return(cols)  
}

new_cols <- oR_pos_function(returns=Bitcoin$return,thresholds=c(1,1.5,2))
Bitcoin <- cbind(Bitcoin,new_cols)

Results:

> head(Bitcoin[Bitcoin$date>="2018-01-01",])
           date    open    high     low   close      volume       market close_ratio spread     return oR_pos>1sd oR_pos>1.5sd oR_pos>2sd
1097 2018-01-01 14112.2 14112.2 13154.7 13657.2 10291200000 229119155396   0.5248042  957.5 -3.2241607         NA           NA         NA
1098 2018-01-02 13625.0 15444.6 13163.6 14982.1 16846600192 251377913955   0.7972381 2281.0  9.9603670   9.960367     9.960367   9.960367
1099 2018-01-03 14978.2 15572.8 14844.5 15201.0 16871900160 255080562912   0.4894961  728.3  1.4874952         NA           NA         NA
1100 2018-01-04 15270.7 15739.7 14522.2 15599.2 21783199744 261795321110   0.8845996 1217.5  2.1511784         NA           NA         NA
1101 2018-01-05 15477.2 17705.2 15202.8 17429.5 23840899072 292544135538   0.8898258 2502.4 12.6140387  12.614039    12.614039  12.614039
1102 2018-01-06 17462.1 17712.4 16764.6 17527.0 18314600448 294217423675   0.8043891  947.8  0.3716621         NA           NA         NA
> 

Alternative per comments:

oR_pos_function <- function(coin_data,thresholds) {

  returns <- coin_data$return
  r_mean <- mean(returns,na.rm=T)
  r_sd <- sd(returns,na.rm=T)

  cols <- lapply(thresholds,function(threshold) {
    col <- ifelse(returns > (r_mean+(r_sd*threshold)),returns,NA)
    return(col)
  })
  cols <- as.data.frame(cols)
  names(cols) <- paste0("oR_pos>",thresholds,"sd")
  coin_data <- cbind(coin_data,cols)
  return(coin_data)  
}

You can use dplyr::mutate to add any such fields

library(dplyr)
Bitcoin %>%
  mutate(oR_pos_1sd = ifelse(return > mean(return) + sd(return), return , NA),
         oR_pos_1.5sd = ifelse(return > mean(return) + 1.5*sd(return), return , NA),
         oR_pos_2sd = ifelse(return > mean(return) + 2*sd(return), return , NA)) 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM