[英]pandas dataframe reshaping/stacking of multiple value variables into seperate columns
Hi I'm trying to reshape a data frame in a certain way.嗨,我正在尝试以某种方式重塑数据框。
this is the data frame I have,这是我拥有的数据框,
des1 des2 des3 interval1 interval2 interval3
value
aaa a b c ##1 ##2 ##3
bbb d e f ##4 ##5 ##6
ccc g h i ##7 ##8 ##9
des1 corresponds with interval1 and so on. des1 对应于 interval1,依此类推。 interval columns have a range of dates and des columns have descriptions.间隔列有一个日期范围,而 des 列有描述。
I'd like to reshape the dataframe such that it looks like this:我想重塑数据框,使其看起来像这样:
des interval
value
aaa a ##1
aaa b ##2
aaa c ##3
bbb d ##4
bbb e ##5
bbb f ##6
ccc g ##7
ccc h ##8
ccc i ##9
How would I go about doing this?我该怎么做呢? I'm a little familar with .stack() but I haven't been able to get exactly what I wanted.我对 .stack() 有点熟悉,但我一直无法得到我想要的。
Thank you for your help.感谢您的帮助。 feel free to post references.随意发布参考。
This might be a shorter approach:这可能是一种更短的方法:
[72]:
df.columns = pd.MultiIndex.from_tuples(map(lambda x: (x[:-1], x), df.columns))
In [73]:
print pd.DataFrame({key:df[key].stack().values for key in set(df.columns.get_level_values(0))},
index = df['des'].stack().index.get_level_values(0))
des interval
value
aaa a ##1
aaa b ##2
aaa c ##3
bbb d ##4
bbb e ##5
bbb f ##6
ccc g ##7
ccc h ##8
ccc i ##9
Or preserve the 1,2,3 info:或保留 1,2,3 信息:
[73]:
df.columns = pd.MultiIndex.from_tuples(map(lambda x: (x[:-1], x[-1]), df.columns))
Keys = set(df.columns.get_level_values(0))
df2 = pd.concat([df[key].stack() for key in Keys], axis=1)
df2.columns = Keys
print df2
des interval
value
aaa 1 a ##1
2 b ##2
3 c ##3
bbb 1 d ##4
2 e ##5
3 f ##6
ccc 1 g ##7
2 h ##8
3 i ##9
This is just a .melt
, docs are here这只是一个.melt
,文档在这里
In [33]: pd.melt(df.reset_index(),
id_vars=['values'],
value_vars=['interval1','interval2','interval3'])
Out[33]:
values variable value
0 aaa interval1 ##1
1 bbb interval1 ##4
2 ccc interval1 ##7
3 aaa interval2 ##2
4 bbb interval2 ##5
5 ccc interval2 ##8
6 aaa interval3 ##3
7 bbb interval3 ##6
8 ccc interval3 ##9
I think the solution provided by CT Zhu is very genius.我觉得CT Zhu提供的解决方案非常天才。 But you also can reshape this step by step (maybe this is the common way).但是你也可以一步一步地重塑这个(也许这是常见的方式)。
d = {'des1' : ['', 'a', 'd', 'g'],
'des2' : ['', 'b', 'e', 'h'],
'des3' : ['', 'c', 'f', 'i'],
'interval1' : ['', '##1', '##4', '##7'],
'interval2' : ['', '##2', '##5', '##6'],
'interval3' : ['', '##3', '##6', '##9']}
df = pd.DataFrame(d, index=['value', 'aaa', 'bbb', 'ccc'],
columns=['des1', 'des2', 'des3', 'interval1', 'interval2', 'interval3'])
nd = {'des' : [''] + df.iloc[1, 0:3].tolist() + df.iloc[2, 0:3].tolist() + df.iloc[3, 0:3].tolist(),
'interval' : ['']+ df.iloc[1, 3:6].tolist() + df.iloc[2, 3:6].tolist() + df.iloc[3, 3:6].tolist()}
ndf = pd.DataFrame(nd, index=['value', 'aaa', 'aaa', 'aaa', 'bbb', 'bbb', 'bbb', 'ccc', 'ccc', 'ccc'], columns=['des', 'interval'])
This type of reshaping can be done conveniently with pandas.wide_to_long
:使用pandas.wide_to_long
可以方便地完成这种类型的重塑:
import io
import pandas as pd # v 1.2.3
data = '''
value des1 des2 des3 interval1 interval2 interval3
aaa a b c ##1 ##2 ##3
bbb d e f ##4 ##5 ##6
ccc g h i ##7 ##8 ##9
'''
df = pd.read_csv(io.StringIO(data), index_col=0, delim_whitespace=True)
pd.wide_to_long(df.reset_index(), stubnames=['des', 'interval'],
i='value', j='var_id').droplevel(1).sort_index()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.