R：过滤另一行中的值与条件匹配的行

Question

我有一个包含 1000 多种资产的长格式财务数据的大型数据框，我正在尝试分析一只股票的收益影响其他股票收益的情况。 我想过滤一个图以查看一项资产的价值如何，在另一项资产的价值为 X 的那一天。

我创建了一个模拟数据框来说明我的问题：

library(dplyr)
library(ggplot2)

the_date <- c('01-01-1990', '02-01-1990', '03-01-1990', '04-01-1990', '05-01-1990', '01-01-1990', '02-01-1990', '01-01-1990', '02-01-1990','03-01-1990')
the_asset <- c('AAPL', 'AAPL', 'AAPL', 'AAPL', 'AAPL', 'MSFT', 'MSFT','AMZN', 'AMZN', 'AMZN')
the_price <- as.numeric(c(5,6,4,7,8,12,14,50,48,62))
the_returns <- as.numeric(c(0.1, -0.2, 0.14, 0.01, 0.05, -0.002, -0.11, 0.07, 0.08, 0.22))

test_df1 <- data.frame(the_date, the_asset, the_price, the_returns)

test_df1 <- test_df1 %>%
  group_by(the_asset) %>%
  mutate(quartile = ntile(the_returns, n=4))

然后我在 ggplot 中按四分位数绘制回报：

test_df1 %>%
  group_by(quartile) %>%
  ggplot(aes(x = quartile, y = the_returns)) +
  geom_bar(stat = 'identity') +
  ggtitle('Returns by Quartile')

我想以下列方式过滤这个图：

当同一日期的“AMZN”（或任何其他股票）的四分位数为 1（例如，可能为 2 或 3）时，我想查看“AAPL”（或任何其他股票）的四分位数回报

我曾考虑将数据框设为宽格式，以便将每个资产作为单独的列，但我不确定这是否是最佳选择或如何准确进行。

Answer 1

这应该有效，但在小数据集上它不会返回一个很好的图，因为在过滤到您提到的条件后剩下的行太少了：

test_df1 %>%
  spread(the_asset,quartile, fill=0) %>%
  filter(AMZN==1) %>% 
  ggplot(aes(x = as.factor(AAPL), y = the_returns)) +
  geom_bar(stat = 'identity') +
  ggtitle('Returns by Quartile')

Answer 2

你可以使用 SQL 来解决这个问题（我相信你可以用 dplyr 做同样的事情，但我更擅长 sql...）。

# Add a column with the count of dates that are the same
library(sqldf) # you might have to install other packages to make this one work

test_df1<-sqldf("SELECT count(*) OVER (PARTITION BY the_date) AS SAME_DATE, *  
      FROM test_df1 ")

情节：

ggplot(test_df1,
       aes(
         x = SAME_DATE,
         y = the_returns
         
       
       )) +
  geom_bar(stat = 'identity', aes(fill =the_asset)) 
  ggtitle('Returns by Quartile')

编辑：也许这更好：

ggplot(test_df1,
       aes(
         x = SAME_DATE,
         y = the_returns
         
       )) +
  geom_bar(stat = 'identity', alpha = 0.5) +
  geom_jitter(aes(shape = the_asset, color = the_asset),size = 4)
ggtitle('Returns by Date')

R：过滤另一行中的值与条件匹配的行

问题描述

2 个解决方案

解决方案1
1 2020-11-12 08:12:21

解决方案2
1 2020-11-12 11:01:24

R：过滤另一行中的值与条件匹配的行

问题描述

2 个解决方案

解决方案1 1 2020-11-12 08:12:21

解决方案2 1 2020-11-12 11:01:24

解决方案1
1 2020-11-12 08:12:21

解决方案2
1 2020-11-12 11:01:24