如何在熊猫中创建层次结构列？

Question

I have a pandas dataframe that looks like this: 我有一个看起来像这样的熊猫数据框：

          rank_2015   num_2015   rank_2014   num_2014   ....  num_2008
France    8           1200       9           1216       ....  1171
Italy     11          789        6           788        ....  654

Now I want to draw a bar chart of the sums just the num_ columns, by year. 现在，我想按年绘制仅num_列的总和的num_ 。 So on the x-axis I would like years from 2008 to 2015, and on the y-axis I would like the sum of the related num_ column. 因此，在x轴上，我想要从2008年到2015年的年份，在y轴上，我想要相关的num_列的总和。

What's the best way to do this? 最好的方法是什么？ I know how to get the sums for each column: 我知道如何获取每一列的总和：

df.sum()

But what I don't know is how to chart only the num_ columns, and also how to re-label those columns so that the labels are integers rather than strings, in order to get them to chart correctly. 但是我不知道如何只num_列，以及如何重新标记这些列，以使标签是整数而不是字符串，以便正确绘制图表。

I'm wondering if I want to create hierarchical columns, like this: 我想知道是否要创建分层列，如下所示：

          rank               num
          2015        2014   2015     2014   ....  2008
France    8           9      1200     1216   ....  1171
Italy     11          6      789      788    ....  654

Then I could just chart the columns in the num section. 然后，我可以将num部分中的列绘制成图表。

How can I get my dataframe into this shape? 如何使数据框变成这种形状？

Answer 1

You could use str.extract with the regex pattern (.+)_(\\d+) to convert the columns to a DataFrame: 您可以将str.extract与正则表达式模式(.+)_(\\d+)以将列转换为DataFrame：

cols = df.columns.str.extract(r'(.+)_(\d+)', expand=True)
#       0     1
# 0   num  2008
# 1   num  2014
# 2   num  2015
# 3  rank  2014
# 4  rank  2015

You can then build a hierarchical (MultiIndex) index from cols and reassign it to df.columns : 然后，您可以从cols 建立一个分层（MultiIndex）索引，并将其重新分配给df.columns ：

df.columns = pd.MultiIndex.from_arrays((cols[0], cols[1]))

so that df becomes 使df变为

         num             rank     
        2008  2014  2015 2014 2015
France  1171  1216  1200    9    8
Italy    654   788   789    6   11

import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame({ 'num_2008': [1171, 654],
 'num_2014': [1216, 788],
 'num_2015': [1200, 789],
 'rank_2014': [9, 6],
 'rank_2015': [8, 11]}, index=['France', 'Italy'])


cols = df.columns.str.extract(r'(.+)_(\d+)', expand=True)
cols[1] = pd.to_numeric(cols[1])
df.columns = pd.MultiIndex.from_arrays((cols[0], cols[1]))
df.columns.names = [None]*2

df['num'].sum().plot(kind='bar')
plt.show()

Answer 2

Probably you don't need re-shaping your dataset, it can be achieved easier. 可能您不需要重新设置数据集的形状，可以轻松实现。

Create new dataset, which contains num_ data only 创建新的数据集，仅包含num_数据
Rename columns 重命名列
Plot sum 地块总和

Dummy data: 虚拟数据：

Code: 码：

df_num = df[[c for c in df.columns if c.startswith('num_')]]
df_num.columns = [c.lstrip('num_') for c in df_num.columns]
df_num.sum().plot(kind='bar')

Result: 结果：

如何在熊猫中创建层次结构列？

问题描述

2 个解决方案

解决方案1
4 已采纳 2016-10-09 15:33:20

解决方案2
1 2016-10-09 15:33:57

如何在熊猫中创建层次结构列？

问题描述

2 个解决方案

解决方案1 4 已采纳 2016-10-09 15:33:20

解决方案2 1 2016-10-09 15:33:57

解决方案1
4 已采纳 2016-10-09 15:33:20

解决方案2
1 2016-10-09 15:33:57