简体   繁体   English

从熊猫数据框中绘制和格式化海图

[英]plotting & formatting seaborn chart from pandas dataframe

I have a pandas dataframe al_df that contains the population of Alabama from a recent US census. 我有一个熊猫数据al_df ,其中包含最近一次美国人口普查所得的阿拉巴马州人口。 I created a cumulative function that I plot using seaborn , resulting in this chart: 我创建了一个使用seaborn绘制的累积函数,结果如下图所示:

阿拉巴马州人口CDF

The code that relates to the plotting is this: 与绘图相关的代码是这样的:

figure(num=None, figsize=(20, 10))

plt.title('Cumulative Distribution Function for ALABAMA population')
plt.xlabel('City')
plt.ylabel('Percentage')
#sns.set_style("whitegrid", {"ytick.major.size": "0.1",})
plt.plot(al_df.pop_cum_perc)

My questions are: 1) How can I change the ticks, so the yaxis shows a grid line every 0.1 units instead of the default 0.2 shown? 我的问题是:1)我如何更改刻度,所以yaxis每隔0.1个单位显示一条网格线,而不是显示的默认0.2行? 2) How can I change the x axis to show the actual names of the city, plotted vertically, instead of the "rank" of the city (from the Pandas index)? 2)如何更改x轴以显示垂直绘制的城市实际名称,而不是城市的“等级”(来自Pandas索引)? (there are over 300 names, so they are not going to fit well horizontally). (有300多个名称,因此它们在水平方向上不太合适)。

For question 1) ,add: 对于问题1),添加:

plt.yticks(np.arange(0,1+0.1,0.1))

Question 2), I found this in the matplotlib gallery: ticks_and_spines example code 问题2),我在matplotlib库中找到了这个: ticks_and_spines示例代码

The matplotlib way would be to use MutlipLocator . matplotlib方法将是使用MutlipLocator The second one is also straight forward 第二个也很简单

from matplotlib.ticker import *
plt.plot(range(10))
ax=plt.gca()
ax.yaxis.set_major_locator(MultipleLocator(0.5))
plt.xticks(range(10), list('ABCDEFGHIJ'), rotation=90) #would be range(3xx), List_of_city_names, rotation=90
plt.savefig('temp.png')

在此处输入图片说明

After some research, and not been able to find a "native" Seaborn solution, I came up with the code below, partially based on @Pablo Reyes and @CT Zhu suggestions, and using matplotlib functions: 经过一些研究,但找不到“本机” Seaborn解决方案,我提出了以下代码,部分基于@Pablo Reyes和@CT Zhu建议,并使用了matplotlib函数:

from matplotlib.ticker import *
figure(num=None, figsize=(20, 10))

plt.title('Cumulative Distribution Function for ALABAMA population')
plt.xlabel('City')
plt.ylabel('Percentage')
plt.plot(al_df.pop_cum_perc)

#set the tick size of y axis
ax = plt.gca()
ax.yaxis.set_major_locator(MultipleLocator(0.1))

#set the labels of y axis and text orientation
ax.xaxis.set_major_locator(MultipleLocator(10))
ax.set_xticklabels(labels, rotation =90)

The solution introduced a new element "labels" which I had to specify before the plot, as a new Python list created from my Pandas dataframe: 该解决方案引入了一个新元素“标签”,我必须在绘图之前指定它,作为从我的Pandas数据帧创建的新Python列表:

labels = al_df.NAME.values[:]

Producing the following chart: 产生以下图表: 在此处输入图片说明

This requires some tweaking, since specifying a display of every city in the pandas data frame, like this: 这需要进行一些调整,因为在熊猫数据框中指定了每个城市的显示,如下所示:

ax.xaxis.set_major_locator(MultipleLocator(1))

Produces a chart impossible to read (displaying only x axis): 产生无法读取的图表(仅显示x轴): 在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM