简体   繁体   English

R使用ifelse函数为多个数据框创建新列

[英]R Create new columns with ifelse-function for multiple dataframes

I want to create several columns with a ifelse()-condition for multiple dataframes. 我想为多个数据框创建一个具有ifelse()条件的列。 In this case the dataframes are 3 time-series data for cryptocurrencies. 在这种情况下,数据帧是用于加密货币的3个时间序列数据。 Here is the code to download the 3 dataframes automatically: 这是自动下载3个数据框的代码:

library(tidyverse)
library(crypto)

crypto_chart <- crypto_prices()%>% select(-id, -symbol,-price_btc, -`24h_volume_usd`,-available_supply, -total_supply,-max_supply, -percent_change_1h, -percent_change_24h, -percent_change_7d, -last_updated)%>% slice(1:3)

list_cryptocurrencies <-crypto_chart$name   

map(list_cryptocurrencies,
    function(x) crypto_history(x, start_date = '20150101', end_date = '20190303')%>%
      select(-slug, -symbol, -name, -`ranknow`))%>%
set_names(list_cryptocurrencies)%>%
list2env(envir = .GlobalEnv)

##Calculating return
map(mget(list_cryptocurrencies),
function(x) x %>% mutate(`return` =   (close-open)/open * 100))%>%
list2env(mget(list_cryptocurrencies), envir = .GlobalEnv)

Now I want to detect positive overreactions (oR_pos) in the returns. 现在,我想在返回中检测到积极的过度反应(oR_pos)。 I define an overreaction as a value (return) higher than the mean + 1 standard deviation. 我将过度反应定义为高于平均值+ 1标准偏差的值(返回值)。 I want do this also for 1.5 and 2 standard deviations. 我也想针对1.5和2个标准差进行此操作。 Here ist my desired output for one cryptocurrencie (Bitcoin): 这是我想要的一种加密货币(比特币)的输出:

> Bitcoin
     date    open   close      return     oR_pos>1sd oR_pos>1.5sd oR_pos>2sd
1  2018-01-01 14112.2 13657.2  -3.2241607         NA           NA         NA
2  2018-01-02 13625.0 14982.1   9.9603670   9.960367     9.960367   9.960367
3  2018-01-03 14978.2 15201.0   1.4874952         NA           NA         NA
4  2018-01-04 15270.7 15599.2   2.1511784         NA           NA         NA
5  2018-01-05 15477.2 17429.5  12.6140387  12.614039    12.614039  12.614039
6  2018-01-06 17462.1 17527.0   0.3716621         NA           NA         NA
7  2018-01-07 17527.3 16477.6  -5.9889430         NA           NA         NA
8  2018-01-08 16476.2 15170.1  -7.9271919         NA           NA         NA
9  2018-01-09 15123.7 14595.4  -3.4931928         NA           NA         NA
10 2018-01-10 14588.5 14973.3   2.6376941         NA           NA         NA
11 2018-01-11 14968.2 13405.8 -10.4381288         NA           NA         NA
12 2018-01-12 13453.9 13980.6   3.9148500   3.914850           NA         NA

Now I have 3 new columns with overreactions(oR_pos) which are > 1sd; 现在我有3个新列,它们的反应过度(oR_pos)> 1sd; 1.5sd and 2sd. 1.5sd和2sd。

I've already tried this code: 我已经尝试过此代码:

oR_pos_function <- function(y) {
n <- seq(1, 2, 0.5)
y[paste0("oR_pos>", n, "sd")] <-lapply(n, function(x)
ifelse(x$return > mean(x$return)+ sd(x$return),x$return, NA))
y
}

map(mget(list_cryptocurrencies), oR_pos_function)%>%
set_names(list_cryptocurrencies)%>%
list2env(envir = .GlobalEnv)

But it doesen't works. 但这是行不通的。 Can someone help me? 有人能帮我吗?

The following closely matches your intended function, adding the desired columns onto your crypto, while allowing the desired sd thresholds to be passed-in as parameter for flexibility. 以下内容与您的预期功能紧密匹配,将所需的列添加到您的加密货币上,同时允许将所需的sd阈值作为参数传入以提高灵活性。 Aside note, the solution below uses > as per OP, but you may wish to consider movement +/- direction from sd. 顺便提一下,下面的解决方案按照OP使用> ,但是您可能希望考虑从sd移动+/-方向。 Using solution below could be done using instead: 可以使用以下解决方案代替:

col <- ifelse(returns > (r_mean+(r_sd*threshold)) | 
              returns < (r_mean-(r_sd*threshold)),
              returns,NA)

Solution as follows: 解决方法如下:

oR_pos_function <- function(returns,thresholds) {

  r_mean <- mean(returns,na.rm=T)
  r_sd <- sd(returns,na.rm=T)

  cols <- lapply(thresholds,function(threshold) {
    col <- ifelse(returns > (r_mean+(r_sd*threshold)),returns,NA)
    return(col)
  })
  cols <- as.data.frame(cols)
  names(cols) <- paste0("oR_pos>",thresholds,"sd")
  return(cols)  
}

new_cols <- oR_pos_function(returns=Bitcoin$return,thresholds=c(1,1.5,2))
Bitcoin <- cbind(Bitcoin,new_cols)

Results: 结果:

> head(Bitcoin[Bitcoin$date>="2018-01-01",])
           date    open    high     low   close      volume       market close_ratio spread     return oR_pos>1sd oR_pos>1.5sd oR_pos>2sd
1097 2018-01-01 14112.2 14112.2 13154.7 13657.2 10291200000 229119155396   0.5248042  957.5 -3.2241607         NA           NA         NA
1098 2018-01-02 13625.0 15444.6 13163.6 14982.1 16846600192 251377913955   0.7972381 2281.0  9.9603670   9.960367     9.960367   9.960367
1099 2018-01-03 14978.2 15572.8 14844.5 15201.0 16871900160 255080562912   0.4894961  728.3  1.4874952         NA           NA         NA
1100 2018-01-04 15270.7 15739.7 14522.2 15599.2 21783199744 261795321110   0.8845996 1217.5  2.1511784         NA           NA         NA
1101 2018-01-05 15477.2 17705.2 15202.8 17429.5 23840899072 292544135538   0.8898258 2502.4 12.6140387  12.614039    12.614039  12.614039
1102 2018-01-06 17462.1 17712.4 16764.6 17527.0 18314600448 294217423675   0.8043891  947.8  0.3716621         NA           NA         NA
> 

Alternative per comments: 每个注释的替代方案:

oR_pos_function <- function(coin_data,thresholds) {

  returns <- coin_data$return
  r_mean <- mean(returns,na.rm=T)
  r_sd <- sd(returns,na.rm=T)

  cols <- lapply(thresholds,function(threshold) {
    col <- ifelse(returns > (r_mean+(r_sd*threshold)),returns,NA)
    return(col)
  })
  cols <- as.data.frame(cols)
  names(cols) <- paste0("oR_pos>",thresholds,"sd")
  coin_data <- cbind(coin_data,cols)
  return(coin_data)  
}

You can use dplyr::mutate to add any such fields 您可以使用dplyr::mutate添加任何此类字段

library(dplyr)
Bitcoin %>%
  mutate(oR_pos_1sd = ifelse(return > mean(return) + sd(return), return , NA),
         oR_pos_1.5sd = ifelse(return > mean(return) + 1.5*sd(return), return , NA),
         oR_pos_2sd = ifelse(return > mean(return) + 2*sd(return), return , NA)) 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM