[英]Hierarchic pie/donut chart from Pandas DataFrame using bokeh or matplotlib
I have the following pandas DataFrame ("A" is the last column's header; the rest of columns are a combined hierarchical index): 我有以下pandas DataFrame(“A”是最后一列的标题;其余列是组合层次索引):
A
kingdom phylum class order family genus species
No blast hit 2496
k__Archaea p__Euryarchaeota c__Thermoplasmata o__E2 f__[Methanomassiliicoccaceae] g__vadinCA11 s__ 6
k__Bacteria p__ c__ o__ f__ g__ s__ 5
p__Actinobacteria c__Acidimicrobiia o__Acidimicrobiales f__ g__ s__ 0
c__Actinobacteria o__Actinomycetales f__Corynebacteriaceae g__Corynebacterium s__stationis 2
f__Micrococcaceae g__Arthrobacter s__ 8
o__Bifidobacteriales f__Bifidobacteriaceae g__Bifidobacterium s__ 506
s__animalis 48
c__Coriobacteriia o__Coriobacteriales f__Coriobacteriaceae g__ s__ 734
g__Collinsella s__aerofaciens 3
(a CSV with the data is available here ) (与数据的CSV,请点击这里 )
I want to plot in a pie/donut chart , where each concentric circle is a level (kingdom, phylum, etc.) and is divided according to the sum of the column A for that level, so I end with something similar to this, but with my data: 我想在饼图/圆环图中绘图,其中每个同心圆是一个水平(王国,门等),并根据该级别的A列的总和进行划分,所以我以类似于此的结尾,但是我的数据:
I've looked into matplotlib and bokeh, but the most similar thing I've found so far is the bokeh Donut chart example, using a deprecated chart, which I don't know how to extrapolate for more than 2 levels. 我已经研究了matplotlib和散景,但到目前为止我发现的最相似的东西是散景甜甜圈图表示例,使用了一个不推荐的图表,我不知道如何推断超过2个级别。
I don't know if there is anything pre-defined that does this, but it's possible to construct your own using groupby and overlapping pie plots. 我不知道是否有任何预定义可以做到这一点,但是可以构建自己的使用groupby和重叠饼图。 I constructed the following script to take your data and get something at least similar to what you specified.
我构造了以下脚本来获取您的数据,并获得至少与您指定的内容相似的内容。
Note that the groupby calls (which are used to calculate the totals at each level) must have sorting turned off for things to line up correctly. 请注意,groupby调用(用于计算每个级别的总计)必须关闭排序才能正确排列。 Your dataset is also very non-uniform, so I just made some random data to spread out the resulting chart a bit for the sake of illustration.
你的数据集也是非常不均匀的,所以为了说明起见,我只是制作了一些随机数据来展开得到的图表。
You'll probably have to tweak colors and label positions, but it may be a start. 您可能需要调整颜色和标签位置,但这可能是一个开始。
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
df = pd.read_csv('species.csv')
df = df.dropna() # Drop the "no hits" line
df['A'] = np.random.rand(len(df)) * 100 + 1
# Do the summing to get the values for each layer
def nested_pie(df):
cols = df.columns.tolist()
outd = {}
gb = df.groupby(cols[0], sort=False).sum()
outd[0] = {'names':gb.index.values, 'values':gb.values}
for lev in range(1,7):
gb = df.groupby(cols[:(lev+1)], sort=False).sum()
outd[lev] = {'names':gb.index.levels[lev][gb.index.labels[lev]].tolist(),
'values':gb.values}
return outd
outd = nested_pie(df)
diff = 1/7.0
# This first pie chart fill the plot, it's the lowest level
plt.pie(outd[6]['values'], labels=outd[6]['names'], labeldistance=0.9,
colors=plt.style.library['bmh']['axes.color_cycle'])
ax = plt.gca()
# For each successive plot, change the max radius so that they overlay
for i in np.arange(5,-1,-1):
ax.pie(outd[i]['values'], labels=outd[i]['names'],
radius=np.float(i+1)/7.0, labeldistance=((2*(i+1)-1)/14.0)/((i+1)/7.0),
colors=plt.style.library['bmh']['axes.color_cycle'])
ax.set_aspect('equal')
Modulo slight changes from the call to random()
, this yields a plot like this: 从调用到
random()
模数略有变化,这会产生如下情节:
On your real data it looks like this: 在您的真实数据上,它看起来像这样:
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.