将pandas数据帧的子集转换为多索引数据帧

Question

I have the following dataframe: 我有以下数据帧：

df.head(14)

I'd like to transpose just the yr and the ['WA_','BA_','IA_','AA_','NA_','TOM_'] variables by Label. 我想通过Label转换yr和['WA_','BA_','IA_','AA_','NA_','TOM_']变量。 The resulting dataframe should then be a Multi-indexed frame with Label and the WA_, BA_, etc. and the columns names will be 2010, 2011, etc. I've tried, transpose(), groubby(), pivot_table(), long_to_wide() , and before I roll my own nested loop going line by line through this df I thought I'd ping the community. 结果数据框应该是带有Label和WA_，BA_等的多索引框架，列名称将是2010,2011等。我试过， transpose(), groubby(), pivot_table(), long_to_wide() ，在我通过这个df逐行滚动我自己的嵌套循环之前，我以为我会ping社区。 Something like this by every Label group: 每个Label组都有类似的东西：

I feel like the answer is in one of those functions but I'm just missing it. 我觉得答案就在于其中一个功能，但我只是错过了它。 Thanks for your help! 谢谢你的帮助！

Answer 1

From what I can tell by your illustrated screenshots, you want WA_ , BA_ etc as rows and yr as columns, with Label remaining as a row index. 从你所说明的截图中我可以看出，你想要WA_ ， BA_等作为行，而将yr作为列，并将Label保留为行索引。 If so, consider stack() and unstack() : 如果是这样，请考虑stack()和unstack() ：

# sample data
labels = ["Albany County","Big Horn County"]
n_per_label = 7
n_rows = n_per_label * len(labels)
years = np.arange(2010, 2017)
min_val = 10000
max_val = 40000

data = {"Label": sorted(np.array(labels * n_per_label)),
        "WA_": np.random.randint(min_val, max_val, n_rows),
        "BA_": np.random.randint(min_val, max_val, n_rows),
        "IA_": np.random.randint(min_val, max_val, n_rows),
        "AA_": np.random.randint(min_val, max_val, n_rows),
        "NA_": np.random.randint(min_val, max_val, n_rows),
        "TOM_": np.random.randint(min_val, max_val, n_rows),
        "yr":np.append(years,years)
       }
df = pd.DataFrame(data)
      AA_    BA_    IA_    NA_   TOM_    WA_            Label    yr
0   27757  23138  10476  20047  34015  12457    Albany County  2010
1   37135  30525  12296  22809  27235  29045    Albany County  2011
2   11017  16448  17955  33310  11956  19070    Albany County  2012
3   24406  21758  15538  32746  38139  39553    Albany County  2013
4   29874  33105  23106  30216  30176  13380    Albany County  2014
5   24409  27454  14510  34497  10326  29278    Albany County  2015
6   31787  11301  39259  12081  31513  13820    Albany County  2016
7   17119  20961  21526  37450  14937  11516  Big Horn County  2010
8   13663  33901  12420  27700  30409  26235  Big Horn County  2011
9   37861  39864  29512  24270  15853  29813  Big Horn County  2012
10  29095  27760  12304  29987  31481  39632  Big Horn County  2013
11  26966  39095  39031  26582  22851  18194  Big Horn County  2014
12  28216  33354  35498  23514  23879  17983  Big Horn County  2015
13  25440  28405  23847  26475  20780  29692  Big Horn County  2016

Now set Label and yr as indices. 现在将Label和yr设置为索引。

df.set_index(["Label","yr"], inplace=True)

From here, unstack() will pivot the inner-most index to columns. 从这里开始， unstack()会将最内层索引转移到列。 Then, stack() can swing our value columns down into rows. 然后， stack()可以将我们的值列向下转换为行。

df.unstack().stack(level=0)

yr                     2010   2011   2012   2013   2014   2015   2016
Label                                                               
Albany County   AA_   27757  37135  11017  24406  29874  24409  31787
                BA_   23138  30525  16448  21758  33105  27454  11301
                IA_   10476  12296  17955  15538  23106  14510  39259
                NA_   20047  22809  33310  32746  30216  34497  12081
                TOM_  34015  27235  11956  38139  30176  10326  31513
                WA_   12457  29045  19070  39553  13380  29278  13820
Big Horn County AA_   17119  13663  37861  29095  26966  28216  25440
                BA_   20961  33901  39864  27760  39095  33354  28405
                IA_   21526  12420  29512  12304  39031  35498  23847
                NA_   37450  27700  24270  29987  26582  23514  26475
                TOM_  14937  30409  15853  31481  22851  23879  20780
                WA_   11516  26235  29813  39632  18194  17983  29692

将pandas数据帧的子集转换为多索引数据帧

问题描述

1 个解决方案

解决方案1
3 已采纳 2017-08-22 17:23:26

将pandas数据帧的子集转换为多索引数据帧

问题描述

1 个解决方案

解决方案1 3 已采纳 2017-08-22 17:23:26

解决方案1
3 已采纳 2017-08-22 17:23:26