简体   繁体   English

Plot python 中的同比价格相关性。 (绘制 Dataframe 行的相关性)

[英]Plot the YoY price correlation in python. (Plot the correlation of Dataframe rows)

In the context of trying to plot the YoY correlation of a DataFrame in Python.在尝试 plot 的情况下,Python 中的 DataFrame 的 YoY 相关性。 The question is how does one get the 3 pair-wise correlation coefficients representing each pair of the variables "AAPL", "IBM" and "MSFT" correlation each year.问题是如何获得代表每一对变量“AAPL”、“IBM”和“MSFT”相关性的 3 个成对相关系数。 Then plot them with matplotlib.然后 plot 与 matplotlib 一起。

How does one calculate a correlation by row?如何按行计算相关性? .corrwith seems to be whats suggested but it it not working here. .corrwith似乎是建议的,但它在这里不起作用。

https://www.geeksforgeeks.org/python-pandas-dataframe-corrwith/ https://www.geeksforgeeks.org/python-pandas-dataframe-corrwith/

I managed to get to a pandas DataFrame where each row represents the year and each element represents the cumulative price over the year.我设法得到一个 pandas DataFrame ,其中每一行代表一年,每个元素代表一年中的累计价格。 I would like to take the correlations of the cumulative YoY prices then plot them as a function of time.我想将累计同比价格的相关性然后 plot 它们作为时间的 function。

The data looks like:数据如下:

             AAPL           IBM         MSFT
Year                                        
2003   333.392142  21429.009979  6585.475002
2004   637.586428  22862.419960  6837.309986
2005  1678.695713  21121.199997  6519.779993
2006  2545.412858  20827.630028  6592.800003
2007  4603.665710  26528.350021  7638.409990
2008  5143.625731  27841.030014  6755.059990
2009  5278.287136  27444.059998  5779.759998
2010  9312.338573  33034.919891  6795.050001

The final plot is meant to look like this,最终的 plot 应该是这样的,

在此处输入图像描述

To summarize the question: How does one take the following data, calculate the 3 pairwise correlations for each year and then use matplotlib in order to plot the results?总结一下这个问题:如何获取以下数据,计算每年的 3 个成对相关性,然后使用 matplotlib 以得到 plot 结果?

The code to import the data and manipulate it so far is provided below.到目前为止,导入数据并对其进行操作的代码如下所示。 Note yfinance was used to load the data,注意 yfinance 用于加载数据,

#!pip install yfinance
import yfinance as yf
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
ticker_Symbol = "AAPL", "MSFT", "IBM"
start_date = '2003-1-01'
end_date =  '2010-12-31'

df5 = yf.download(ticker_Symbol,start_date , end_date)
df  = df5[["Open"]]

print(df.head(3))

# Index the Year of each Value
df["Year"] = df.index.year
dfYearly = df.groupby(['Year']).sum()
dfYearly = dfYearly["Open"] 
dfYearly

You cannot calculate a correlation between two single numbers.您无法计算两个单个数字之间的相关性。

The idea behind calculating a correlation coefficient is that there is an underlying "population" correlation coefficient that you estimate by calculating the empirical coefficient for a data sample.计算相关系数背后的想法是,您可以通过计算数据样本的经验系数来估计潜在的“总体”相关系数。 But if the size of that sample is 1, you have zero information about any potential correlation.但是,如果该样本的大小为 1,则有关任何潜在相关性的信息为零。

So if you want to calculate separate correlation coefficients for individual years, you will need data that is not already aggregated by year.因此,如果您想计算各个年份的单独相关系数,您将需要尚未按年份汇总的数据。 Then you could in fact use corrwith as the aggregation method per year.然后,您实际上可以使用corrwith作为每年的聚合方法。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM