不同形状的 Pandas 堆叠条形图

Question

I'm currently experimenting with pandas and matplotlib.我目前正在试验熊猫和 matplotlib。

I have created a Pandas dataframe which stores data like this:我创建了一个 Pandas 数据框，它存储这样的数据：

cmc|coloridentity
 1 | G
 1 | R
 2 | G
 3 | G
 3 | B
 4 | B

What I now want to do is to make a stacked bar plot where I can see how many entries per cmc exist.我现在想要做的是制作一个堆积条形图，在那里我可以看到每个cmc存在多少个条目。 And I want to do that for all coloridentity and stack them above.我想为所有coloridentity做这件事并将它们堆叠在上面。

My thoughts so far:到目前为止我的想法：

#get all unique values of coloridentity
unique_values = df['coloridentity'].unique()

#Create two dictionaries. One for the number of entries per cost and one 
# to store the different costs for each color
color_dict_values = {}
color_dict_index = {}
for u in unique_values:
    temp_df = df['cmc'].loc[df['coloridentity'] == u].value_counts()
    color_dict_values[u] = np.array(temp_df)
    color_dict_index[u] = temp_df.index.to_numpy()

width = 0.4
p1 = plt.bar(color_dict_index['G'], color_dict_values['G'], width, color='g')
p2 = plt.bar(color_dict_index['R'], color_dict_values['R'], width, 
             bottom=color_dict_values['G'], color='r')
plt.show()

So but this gives me an error because the line where I say that the bottom of the second plot shall be the values of different plot have different numpy shapes.所以但这给了我一个错误，因为我说第二个图的底部应该是不同图的值的行具有不同的 numpy 形状。

Does anyone know a solution?有谁知道解决方案？ I thought of adding 0 values so that the shapes are the same , but I don't know if this is the best solution, and if yes how the best way would be to solve it.我想添加 0 值以使形状相同，但我不知道这是否是最好的解决方案，如果是，最好的方法是如何解决它。

Answer 1

Working with a fixed index (the range of cmc values), makes things easier.使用固定索引（ cmc值的范围）使事情变得更容易。 That way the color_dict_values of a color_id give a count for each of the possible cmc values (stays zero when there are none).这样， color_dict_values的color_id给出了每个可能的cmc值的计数（当没有时保持为零）。

The color_dict_index isn't needed any more.不再需要color_dict_index了。 To fill in the color_dict_values , we iterate through the temporary dataframe with the value_counts .为了填充color_dict_values ，我们使用value_counts遍历临时数据帧。

To plot the bars, the x-axis is now the range of possible cmc values.要绘制条形图，x 轴现在是可能的cmc值的范围。 I added [1:] to each array to skip the zero at the beginning which would look ugly in the plot.我在每个数组中添加了 [1:] 以跳过开头的零，这在情节中看起来很难看。

The bottom starts at zero, and gets incremented by the color_dict_values of the color that has just been plotted.底部从零开始，并由刚刚绘制的颜色的color_dict_values递增。 (Thanks to numpy, the constant 0 added to an array will be that array.) （感谢 numpy，添加到数组中的常量 0 将是该数组。）

In the code I generated some random numbers similar to the format in the question.在代码中，我生成了一些类似于问题格式的随机数。

import numpy as np
import pandas as pd
from matplotlib import pyplot as plt

N = 50
df = pd.DataFrame({'cmc': np.random.randint(1, 10, N), 'coloridentity': np.random.choice(['R', 'G'], N)})

# get all unique values of coloridentity
unique_values = df['coloridentity'].unique()
# find the range of all cmc indices
max_cmc = df['cmc'].max()
cmc_range = range(max_cmc + 1)

# dictionary for each coloridentity: array of values of each possible cmc
color_dict_values = {}
for u in unique_values:
    value_counts_df = df['cmc'].loc[df['coloridentity'] == u].value_counts()
    color_dict_values[u] = np.zeros(max_cmc + 1, dtype=int)
    for ind, cnt in value_counts_df.iteritems():
        color_dict_values[u][ind] = cnt

width = 0.4
bottom = 0
for col_id, col in zip(['G', 'R'], ['limegreen', 'crimson']):
    plt.bar(cmc_range[1:], color_dict_values[col_id][1:], bottom=bottom, width=width, color=col)
    bottom += color_dict_values[col_id][1:]

plt.xticks(cmc_range[1:]) # make sure every cmc gets a tick label
plt.tick_params(axis='x', length=0) # hide the tick marks
plt.xlabel('cmc')
plt.ylabel('count')
plt.show()

不同形状的 Pandas 堆叠条形图

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-01-19 22:53:08

不同形状的 Pandas 堆叠条形图

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-01-19 22:53:08

解决方案1
1 已采纳 2020-01-19 22:53:08