[英]Plot the YoY price correlation in python. (Plot the correlation of Dataframe rows)
In the context of trying to plot the YoY correlation of a DataFrame in Python.在尝试 plot 的情况下,Python 中的 DataFrame 的 YoY 相关性。 The question is how does one get the 3 pair-wise correlation coefficients representing each pair of the variables "AAPL", "IBM" and "MSFT" correlation each year.
问题是如何获得代表每一对变量“AAPL”、“IBM”和“MSFT”相关性的 3 个成对相关系数。 Then plot them with matplotlib.
然后 plot 与 matplotlib 一起。
How does one calculate a correlation by row?如何按行计算相关性?
.corrwith
seems to be whats suggested but it it not working here. .corrwith
似乎是建议的,但它在这里不起作用。
https://www.geeksforgeeks.org/python-pandas-dataframe-corrwith/ https://www.geeksforgeeks.org/python-pandas-dataframe-corrwith/
I managed to get to a pandas DataFrame where each row represents the year and each element represents the cumulative price over the year.我设法得到一个 pandas DataFrame ,其中每一行代表一年,每个元素代表一年中的累计价格。 I would like to take the correlations of the cumulative YoY prices then plot them as a function of time.
我想将累计同比价格的相关性然后 plot 它们作为时间的 function。
The data looks like:数据如下:
AAPL IBM MSFT
Year
2003 333.392142 21429.009979 6585.475002
2004 637.586428 22862.419960 6837.309986
2005 1678.695713 21121.199997 6519.779993
2006 2545.412858 20827.630028 6592.800003
2007 4603.665710 26528.350021 7638.409990
2008 5143.625731 27841.030014 6755.059990
2009 5278.287136 27444.059998 5779.759998
2010 9312.338573 33034.919891 6795.050001
The final plot is meant to look like this,最终的 plot 应该是这样的,
To summarize the question: How does one take the following data, calculate the 3 pairwise correlations for each year and then use matplotlib in order to plot the results?总结一下这个问题:如何获取以下数据,计算每年的 3 个成对相关性,然后使用 matplotlib 以得到 plot 结果?
The code to import the data and manipulate it so far is provided below.到目前为止,导入数据并对其进行操作的代码如下所示。 Note yfinance was used to load the data,
注意 yfinance 用于加载数据,
#!pip install yfinance
import yfinance as yf
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
ticker_Symbol = "AAPL", "MSFT", "IBM"
start_date = '2003-1-01'
end_date = '2010-12-31'
df5 = yf.download(ticker_Symbol,start_date , end_date)
df = df5[["Open"]]
print(df.head(3))
# Index the Year of each Value
df["Year"] = df.index.year
dfYearly = df.groupby(['Year']).sum()
dfYearly = dfYearly["Open"]
dfYearly
You cannot calculate a correlation between two single numbers.您无法计算两个单个数字之间的相关性。
The idea behind calculating a correlation coefficient is that there is an underlying "population" correlation coefficient that you estimate by calculating the empirical coefficient for a data sample.计算相关系数背后的想法是,您可以通过计算数据样本的经验系数来估计潜在的“总体”相关系数。 But if the size of that sample is 1, you have zero information about any potential correlation.
但是,如果该样本的大小为 1,则有关任何潜在相关性的信息为零。
So if you want to calculate separate correlation coefficients for individual years, you will need data that is not already aggregated by year.因此,如果您想计算各个年份的单独相关系数,您将需要尚未按年份汇总的数据。 Then you could in fact use
corrwith
as the aggregation method per year.然后,您实际上可以使用
corrwith
作为每年的聚合方法。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.