[英]Transpose subset of pandas dataframe into multi-indexed data frame
I have the following dataframe: 我有以下数据帧:
df.head(14)
I'd like to transpose just the yr and the ['WA_','BA_','IA_','AA_','NA_','TOM_']
variables by Label. 我想通过Label转换yr和
['WA_','BA_','IA_','AA_','NA_','TOM_']
变量。 The resulting dataframe should then be a Multi-indexed frame with Label and the WA_, BA_, etc. and the columns names will be 2010, 2011, etc. I've tried, transpose(), groubby(), pivot_table(), long_to_wide()
, and before I roll my own nested loop going line by line through this df I thought I'd ping the community. 结果数据框应该是带有Label和WA_,BA_等的多索引框架,列名称将是2010,2011等。我试过,
transpose(), groubby(), pivot_table(), long_to_wide()
,在我通过这个df逐行滚动我自己的嵌套循环之前,我以为我会ping社区。 Something like this by every Label group: 每个Label组都有类似的东西:
I feel like the answer is in one of those functions but I'm just missing it. 我觉得答案就在于其中一个功能,但我只是错过了它。 Thanks for your help!
谢谢你的帮助!
From what I can tell by your illustrated screenshots, you want WA_
, BA_
etc as rows and yr
as columns, with Label
remaining as a row index. 从你所说明的截图中我可以看出,你想要
WA_
, BA_
等作为行,而将yr
作为列,并将Label
保留为行索引。 If so, consider stack()
and unstack()
: 如果是这样,请考虑
stack()
和unstack()
:
# sample data
labels = ["Albany County","Big Horn County"]
n_per_label = 7
n_rows = n_per_label * len(labels)
years = np.arange(2010, 2017)
min_val = 10000
max_val = 40000
data = {"Label": sorted(np.array(labels * n_per_label)),
"WA_": np.random.randint(min_val, max_val, n_rows),
"BA_": np.random.randint(min_val, max_val, n_rows),
"IA_": np.random.randint(min_val, max_val, n_rows),
"AA_": np.random.randint(min_val, max_val, n_rows),
"NA_": np.random.randint(min_val, max_val, n_rows),
"TOM_": np.random.randint(min_val, max_val, n_rows),
"yr":np.append(years,years)
}
df = pd.DataFrame(data)
AA_ BA_ IA_ NA_ TOM_ WA_ Label yr
0 27757 23138 10476 20047 34015 12457 Albany County 2010
1 37135 30525 12296 22809 27235 29045 Albany County 2011
2 11017 16448 17955 33310 11956 19070 Albany County 2012
3 24406 21758 15538 32746 38139 39553 Albany County 2013
4 29874 33105 23106 30216 30176 13380 Albany County 2014
5 24409 27454 14510 34497 10326 29278 Albany County 2015
6 31787 11301 39259 12081 31513 13820 Albany County 2016
7 17119 20961 21526 37450 14937 11516 Big Horn County 2010
8 13663 33901 12420 27700 30409 26235 Big Horn County 2011
9 37861 39864 29512 24270 15853 29813 Big Horn County 2012
10 29095 27760 12304 29987 31481 39632 Big Horn County 2013
11 26966 39095 39031 26582 22851 18194 Big Horn County 2014
12 28216 33354 35498 23514 23879 17983 Big Horn County 2015
13 25440 28405 23847 26475 20780 29692 Big Horn County 2016
Now set Label
and yr
as indices. 现在将
Label
和yr
设置为索引。
df.set_index(["Label","yr"], inplace=True)
From here, unstack()
will pivot the inner-most index to columns. 从这里开始,
unstack()
会将最内层索引转移到列。 Then, stack()
can swing our value columns down into rows. 然后,
stack()
可以将我们的值列向下转换为行。
df.unstack().stack(level=0)
yr 2010 2011 2012 2013 2014 2015 2016
Label
Albany County AA_ 27757 37135 11017 24406 29874 24409 31787
BA_ 23138 30525 16448 21758 33105 27454 11301
IA_ 10476 12296 17955 15538 23106 14510 39259
NA_ 20047 22809 33310 32746 30216 34497 12081
TOM_ 34015 27235 11956 38139 30176 10326 31513
WA_ 12457 29045 19070 39553 13380 29278 13820
Big Horn County AA_ 17119 13663 37861 29095 26966 28216 25440
BA_ 20961 33901 39864 27760 39095 33354 28405
IA_ 21526 12420 29512 12304 39031 35498 23847
NA_ 37450 27700 24270 29987 26582 23514 26475
TOM_ 14937 30409 15853 31481 22851 23879 20780
WA_ 11516 26235 29813 39632 18194 17983 29692
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.