简体   繁体   English

熊猫从数据透视表绘图

[英]Pandas Plotting from Pivot Table

I am basically trying to reproduce climate diagrams showing mean temperature and precipitation over the year for various locations. 我基本上试图重现气候图,显示不同地点的全年平均温度和降水量。

I've generated a pivot table from my csv the following way: 我通过以下方式从我的csv生成了一个数据透视表:

data = pd.read_csv("05_temp_rain_v2.csv")
pivot = data.pivot_table(["rain(mm)","temp(dC)"], ["loc","month"])  

sample data in text form: 文本形式的样本数据:

loc,lat,long,year,month,rain(mm),temp(dC)
Adria_-_Bellombra,45.011129,12.034126,1994,1,45.6,4.6  
Adria_-_Bellombra,45.011129,12.034126,1994,2,31.4,4  
Adria_-_Bellombra,45.011129,12.034126,1994,3,1.6,10.7  
Adria_-_Bellombra,45.011129,12.034126,1994,4,74.4,11.5  
Adria_-_Bellombra,45.011129,12.034126,1994,5,26,17.2  
Adria_-_Bellombra,45.011129,12.034126,1994,6,108.6,20.6

Pivot Table: 数据透视表:

在此输入图像描述

Since I am handling various locations, I am iterating over them: 由于我正在处理各种位置,我正在迭代它们:

locations=pivot.index.get_level_values(0).unique()

for location in locations:
    split=pivot.xs(location)

    rain=split["rain(mm)"]
    temp=split["temp(dC)"]

    plt.subplots()
    temp.plot(kind="line",color="r",).legend()
    rain.plot(kind="bar").legend()

An example plot output is shown below: 示例绘图输出如下所示:

在此输入图像描述

Why are my temperature values being plotted starting from February (2)? 为什么我的温度值从2月(2)开始绘制?
I assume it is because the temperature values are listed in the second column. 我认为这是因为温度值列在第二列中。

What would be the proper way to handle and plot different data (two columns) from a pivot table? 从数据透视表处理和绘制不同数据(两列)的正确方法是什么?

It's because line and bar plots do not set the xlim the same way. 这是因为linebar不能以相同的方式设置xlim The x-axis is interpreted as categorical data in case of the bar plot, whereas it is interpreted as continuous data for the line plot. 在条形图的情况下,x轴被解释为分类数据,而它被解释为线图的连续数据。 The result being that xlim and xticks are not set identically in both situations. 结果是xlimxticks在两种情况下都没有相同的设置。

Consider this: 考虑一下:

In [4]: temp.plot(kind="line",color="r",)
Out[4]: <matplotlib.axes._subplots.AxesSubplot at 0x117f555d0>
In [5]: plt.xticks()
Out[5]: (array([ 1.,  2.,  3.,  4.,  5.,  6.]), <a list of 6 Text xticklabel objects>)

where the position of the ticks is an array of float ranging from 1 to 6 . 其中ticks的位置是一个从1到6的浮点数组。

and

In [6]: rain.plot(kind="bar").legend()
Out[6]: <matplotlib.legend.Legend at 0x11c15e950>
In [7]: plt.xticks()
Out[7]: (array([0, 1, 2, 3, 4, 5]), <a list of 6 Text xticklabel objects>)

where the position of the ticks is an array of int ranging from 0 to 5 . 其中ticks的位置是int的数组,范围从0到5

So, the easier is to replace this part: 因此,更容易替换此部分:

temp.plot(kind="line", color="r",).legend()
rain.plot(kind="bar").legend()

by: 通过:

rain.plot(kind="bar").legend()
plt.plot(range(len(temp)), temp, "r", label=temp.name)
plt.legend()

酒吧线情节熊猫

Thanks to jeanrjc's answer and this thread I think I'm finally quite satisfied! 感谢jeanrjc的回答这个帖子,我觉得我终于很满意了!

for location in locations:
#print(pivot.xs(location, level=0))

split=pivot.xs(location)
rain=split["rain(mm)"]
temp=split["temp(dC)"]

fig = plt.figure()
ax1 = rain.plot(kind="bar")
ax2 = ax1.twinx()
ax2.plot(ax1.get_xticks(),temp,linestyle='-',color="r")
ax2.set_ylim((-5, 50.))
#ax1.set_ylim((0, 300.))
ax1.set_ylabel('Precipitation (mm)', color='blue')
ax2.set_ylabel('Temperature (°C)', color='red')
ax1.set_xlabel('Months')
plt.title(location)
labels = ['Jan','Feb','Mar','Apr','May','Jun', 'Jul','Aug','Sep','Oct','Nov','Dez']
#plt.xticks(range(12),labels,rotation=45)
ax1.set_xticklabels(labels, rotation=45)  

I am receiving the following output, which is very close to what I intend: 我收到以下输出,这非常接近我的意图: 样本图

You could loop over the results of a groupby operation: 您可以循环遍历groupby操作的结果:

for name, group in data[['loc', 'month', 'rain(mm)', 'temp(dC)']].groupby('loc'):
    group.set_index('month').plot()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM