根据百分等级创建虚拟变量

Question

I want to create dummy variables for a regression. 我想为回归创建虚拟变量。 So the data roughly looks something like this: 因此数据大致如下所示：

Year  Month Price  Volume  Return StockCode
1991  1       10     300     1.2  AAPL
1991  2       11     320     1.3  AAPL
1992  1       23     310     2.1  AMZN
1992  2       22     302     2.3  AMZN

I would like to rank them based on percentile for variables of Price, Volume, and Return, and create respective dummy variable for each variable for each stock. 我想根据价格，成交量和收益率变量的百分位数对它们进行排名，并为每只股票的每个变量创建各自的虚拟变量。 Top 30% will be be assigned 1, Middle 40% will be assigned 0, and Bottom 30% will be assigned -1. 前30％将被分配为1，中40％将被分配为0，后30％将被分配为-1。 Ideally the dataframe should look something like this: 理想情况下，数据框应如下所示：

Year Month D_Price D_Volume D_Return StockCode
1991  1       -1     -1       -1      AAPL
1991  2       0       1        0      AAPL
1992  1       1       0        0      AMZN
1992  2       0       0        1      AMZN

I've tried to look for resource online and stockoverflow, but there is no example that can answer on how I can approach this problem. 我试图寻找在线资源和库存溢出，但是没有任何示例可以回答我如何解决此问题。 Appreciate for any helps. 感谢任何帮助。 Thanks! 谢谢！

Answer 1

You can use dplyr::percent_rank and cut . 您可以使用dplyr::percent_rank和cut 。

library(dplyr)

df %>%
  mutate_at(vars(Price, Volume, Return), list(cut = function(x) cut(percent_rank(x), c(-Inf,.3,.7,Inf), labels = c(-1,0,1))))

  Year Month Price Volume Return StockCode Price_cut Volume_cut Return_cut
1 1991     1    10    300    1.2      AAPL        -1         -1         -1
2 1991     2    11    320    1.3      AAPL         0          1          0
3 1992     1    23    310    2.1      AMZN         1          0          0
4 1992     2    22    302    2.3      AMZN         0          0          1

Answer 2

You can also use sapply and quantile from base R and stats 您还可以根据base R和stats使用sapply和quantile

Initialize your data.frame: 初始化您的data.frame：

df <- data.frame(Year =c(1991, 1991, 1992, 1992), Month = c(1, 2, 1, 2), Price = c(10, 11, 23, 22), Volume = c(300, 320, 310, 302), Return = c(1.2, 1.3, 2.1, 2.3), StockCode= c('AAPL', 'AAPL', 'AMZN', 'AMZN'))

Make dummy variables: 设置虚拟变量：

dummy <- data.frame(sapply(df[c('Price', 'Volume', 'Return')], function(x) {
  y <- quantile(x, probs=c(0.3, 0.7), type = 7) #0.3 and 0.7 are your cut-off percentiles
  ifelse(x < y[1], -1, ifelse(x < y[2], 0, 1))
  }
))

Bind dummy to your other columns of interest and rename columns to get what you want: 将dummy绑定到您感兴趣的其他列，然后重命名列以获得所需的内容：

result_df <- cbind(df[c('Year', 'Month')], dummy, df['StockCode'])
colnames(result_df)[2:4] <- paste0('D_', colnames(df)[2:4])
result_df

  Year D_Month D_Price D_Volume Return StockCode
1 1991       1      -1       -1     -1      AAPL
2 1991       2       0        1      0      AAPL
3 1992       1       1        0      0      AMZN
4 1992       2       0        0      1      AMZN

Hope that helps! 希望有帮助！

根据百分等级创建虚拟变量

问题描述

2 个解决方案

解决方案1
2 已采纳 2019-05-08 23:28:31

解决方案2
2 2019-05-09 00:20:59

根据百分等级创建虚拟变量

问题描述

2 个解决方案

解决方案1 2 已采纳 2019-05-08 23:28:31

解决方案2 2 2019-05-09 00:20:59

解决方案1
2 已采纳 2019-05-08 23:28:31

解决方案2
2 2019-05-09 00:20:59