[英]How I can calculate correlation between each variable within each group in R using dplyr package?
[英]How i can calculate the correlation of each variable within the same grouping variable using dplyr?
假設我有一個屬於 3 個類別的 8 只股票的金融歷史數據集。我想使用 dplyr 包計算 R 中每個組內每只股票的相關性。
library(tidyverse)
library(tidyquant)
Category = c("Social","Social","Internet","Technology",
"Technology","Internet","Internet")
symbol = c("TWTR","FB","GOOG","TSLA","NOK","AMZN","AAPL")
A = tibble(Category,symbol)
B = tq_get(symbol,
from = "2021-01-01",
to = "2022-01-01")
BA = left_join(B,A,by="symbol")
BA%>%select(symbol,Category,close)
幾天前我發布了這個類似的問題,但分組變量是數字,我的真實世界數據集不適用。 理想的輸出是這樣的:
類別 | 庫存1 | 庫存2 | 心電圖 |
---|---|---|---|
社會的 | TWTR | 臉書 | 心(TWTR,FB) |
互聯網 | 谷歌 | 亞馬遜 | 科爾(谷歌,亞馬遜) |
互聯網 | 谷歌 | 蘋果 | 科爾(谷歌,亞馬遜) |
互聯網 | 亞馬遜 | 蘋果 | 科爾(谷歌,蘋果) |
技術 | 特斯拉 | 挪威克朗 | 科爾(TSLA,挪威克朗) |
關於我如何使用 dplyr 在 R 中做到這一點的任何幫助?
可選數據
var2 = c(rep("A",3),rep("B",3),rep("C",3),rep("D",3),rep("E",3),rep("F",3),
rep("H",3),rep("I",3))
y2 = c(-1.23, -0.983, 1.28, -0.268, -0.46, -1.23,
1.87, 0.416, -1.99, 0.289, 1.7, -0.455,
-0.648, 0.376, -0.887,0.534,-0.679,-0.923,
0.987,0.324,-0.783,-0.679,0.326,0.998);length(y2)
group2 = as.character(c(rep("xx",6),rep("xy",6),rep("xz",6),rep("xx",6)))
data2 = tibble(var2,group2,y2);data2
一個簡單的輔助函數,
fun <- function(ticker, value, ...) {
com <- combn(unique(ticker), 2)
L <- split(value, ticker)
data.frame(
Stock1 = com[1,], Stock2 = com[2,],
Corr = mapply(function(a, b) cor(a, b, ...), L[com[1,]], L[com[2,]])
)
}
和工作:
library(dplyr)
data2 %>%
group_by(group2) %>%
summarize(fun(var2, y2), .groups = "drop")
# # A tibble: 8 x 4
# group2 Stock1 Stock2 Corr
# <chr> <chr> <chr> <dbl>
# 1 xx A B -0.995
# 2 xx A H -0.958
# 3 xx A I 0.853
# 4 xx B H 0.982
# 5 xx B I -0.901
# 6 xx H I -0.967
# 7 xy C D 0.469
# 8 xz E F -0.186
快速驗證:
cor(filter(data2, var2 == "A")$y2, filter(data2, var2 == "B")$y2)
# [1] -0.9949738
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.