[英]Plotting Pandas' pivot_table from long data
I have a xls file with data organized in long format.我有一个 xls 文件,其中包含以长格式组织的数据。 I have four columns: the variable name, the country name, the year and the value.我有四列:变量名称、国家名称、年份和值。
After importing the data in Python with pandas.read_excel, I want to plot the time series of one variable for different countries.使用 pandas.read_excel 在 Python 中导入数据后,我想为不同国家绘制一个变量的时间序列。 To do so, I create a pivot table that transforms the data in wide format.为此,我创建了一个以宽格式转换数据的数据透视表。 When I try to plot with matplotlib, I get an error当我尝试使用 matplotlib 绘图时,出现错误
ValueError: could not convert string to float: 'ZAF'
(where 'ZAF' is the label of one country) (其中“ZAF”是一个国家的标签)
What's the problem?有什么问题?
This is the code:这是代码:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
data = pd.read_excel('raw_emissions_energy.xls','raw data', index_col = None, thousands='.',parse_cols="A,C,F,M")
data['Year'] = data['Year'].astype(str)
data['COU'] = data['COU'].astype(str)
# generate sub-datasets for specific VARs
data_CO2PROD = pd.pivot_table(data[(data['VAR']=='CO2_PBPROD')], index='COU', columns='Year')
plt.plot(data_CO2PROD)
The xls file with raw data looks like: raw data Excel view包含原始数据的 xls 文件如下所示:原始数据 Excel 视图
This is what I get from data_CO2PROD.info()这是我从 data_CO2PROD.info() 得到的
<class 'pandas.core.frame.DataFrame'>
Index: 105 entries, ARE to ZAF
Data columns (total 16 columns):
(Value, 1990) 104 non-null float64
(Value, 1995) 105 non-null float64
(Value, 2000) 105 non-null float64
(Value, 2001) 105 non-null float64
(Value, 2002) 105 non-null float64
(Value, 2003) 105 non-null float64
(Value, 2004) 105 non-null float64
(Value, 2005) 105 non-null float64
(Value, 2006) 105 non-null float64
(Value, 2007) 105 non-null float64
(Value, 2008) 105 non-null float64
(Value, 2009) 105 non-null float64
(Value, 2010) 105 non-null float64
(Value, 2011) 105 non-null float64
(Value, 2012) 105 non-null float64
(Value, 2013) 105 non-null float64
dtypes: float64(16)
memory usage: 13.9+ KB
None
Using data_CO2PROD.plot() instead of plt.plot(data_CO2PROD) allowed me to plot the data.使用 data_CO2PROD.plot() 而不是 plt.plot(data_CO2PROD) 允许我绘制数据。 http://pandas.pydata.org/pandas-docs/stable/visualization.html. http://pandas.pydata.org/pandas-docs/stable/visualization.html。 Simple code:简单代码:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
data= pd.DataFrame(np.random.randn(3,4), columns=['VAR','COU','Year','VAL'])
data['VAR'] = ['CC','CC','KK']
data['COU'] =['ZAF','NL','DK']
data['Year']=['1987','1987','2006']
data['VAL'] = [32,33,35]
data['Year'] = data['Year'].astype(str)
data['COU'] = data['COU'].astype(str)
# generate sub-datasets for specific VARs
data_CO2PROD = pd.pivot_table(data=data[(data['VAR']=='CC')], index='COU', columns='Year')
data_CO2PROD.plot()
plt.show()
I think you need add parameter values
to pivot_table
:我认为您需要将参数values
添加到pivot_table
:
data_CO2PROD = pd.pivot_table(data=data[(data['VAR']=='CC')],
index='COU',
columns='Year',
values='Value')
data_CO2PROD.plot()
plt.show()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.