[英]Change stacking of dataframe in pandas
我有一個看起來像這樣的數據框。
Name 2012 2013 2014
7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12
A a b c d e f g a b c d e f g h i j k l m a b c d e f g h i j k l m
B a b c d e f g a b c d e f g h i j k l m a b c d e f g h i j k l m
等等。 2012、2013、2014代表年份,下面分別是年份,而a,b,c,d,e ...代表各個月份中NAME的值,即A,B..。 每個名稱的a,b,c,d,e ...是不同的,此處僅出於表示目的而顯示。
目前,我已執行以下操作:
workbook = pd.ExcelFile('XYZ.xlsx')
df = workbook.parse(sheetname='Page1-2')
df2 = pd.melt(df, id_vars=["Name"],
var_name="Date", value_name="Value")
即我在df中導入了XYZ.xlsx文件。 使用pd.melt將df排序為df2。 df2的輸出如下所示:
Name Date Value
A 2012 a
A Unnamed b
A Unnamed c
A Unnamed d
A Unnamed e
A Unnamed f
A Unnamed g
A 2013 a
A Unnamed b
A Unnamed c
A Unnamed d
A Unnamed e
以及其他年份和名稱。 我希望我的日期列顯示如下內容:
Date
7/2012
8/2012
9/2012
10/2012
11/2012
12/2012
1/2013
2/2013
3/2013
4/2013
5/2013
6/2013
7/2013
8/2013
根據初始數據框中提到的月份和年份。 我不確定該怎么做。 任何幫助深表感謝!
打印示例數據庫的(df.to_dict())
{'Name': {0: nan, 1: 'A', 2: 'B'}, 2012: {0: '07', 1: 'a', 2: 'a'},'Unnamed: 2': {0: '08', 1: 'b', 2: 'b'}, 'Unnamed: 3': {0: '09', 1: 'c', 2: 'c'}, 'Unnamed: 4': {0: '10', 1: 'd', 2: 'd'}, 'Unnamed: 5': {0: '11', 1: 'e', 2: 'e'}, 'Unnamed: 6': {0: '12', 1: 'f', 2: 'f'}, '2013': {0: '01', 1: 'a', 2: 'a'}, 'Unnamed: 8': {0: '02', 1: 'b', 2: 'b'}, 'Unnamed: 9': {0: '03', 1: 'c', 2: 'c'}, 'Unnamed: 10': {0: '04', 1: 'd', 2: 'd'}, 'Unnamed: 11': {0: '05', 1: 'e', 2: 'e'}, 'Unnamed: 12': {0: '06', 1: 'f', 2: 'f'}, 'Unnamed: 13': {0: '07', 1: 'a', 2: 'a'}, 'Unnamed: 14': {0: '08', 1: 'b', 2: 'b'}, 'Unnamed: 15': {0: '09', 1: 'c', 2: 'c'}, 'Unnamed: 16': {0: '10', 1: 'd', 2: 'd'}, 'Unnamed: 17': {0: '11', 1: 'e', 2: 'e'}, 'Unnamed: 18': {0: '12', 1: 'f', 2: 'f'}, '2014': {0: '01', 1: 'a', 2: 'a'}, 'Unnamed: 20': {0: '02', 1: 'b', 2: 'b'}, 'Unnamed: 21': {0: '03', 1: 'c', 2: 'c'}, 'Unnamed: 22': {0: '04', 1: 'd', 2: 'd'}, 'Unnamed: 23': {0: '05', 1: 'e', 2: 'e'}, 'Unnamed: 24': {0: '06', 1: 'f', 2: 'f'}, 'Unnamed: 25': {0: '07', 1: 'a', 2: 'a'}, 'Unnamed: 26': {0: '08', 1: 'b', 2: 'b'}, 'Unnamed: 27': {0: '09', 1: 'c', 2: 'c'}, 'Unnamed: 28': {0: '10', 1: 'd', 2: 'd'}, 'Unnamed: 29': {0: '11', 1: 'e', 2: 'e'}, 'Unnamed: 30': {0: '12', 1: 'f', 2: 'f'}}
采用:
#create index with column Name
df = df.set_index('Name')
#create Multiindex with columns (add instead Unammed categories) and first row
idx = pd.Series(df.columns)
df.columns = pd.MultiIndex.from_arrays([idx.mask(idx.str.contains('Unnamed:')).ffill(),
df.iloc[0]], names=('Date','Month'))
#remove first row
df = df.iloc[1:]
print (df)
Date 2012 2013 ... 2014
Month 07 08 09 10 11 12 01 02 03 04 ... 03 04 05 06 07 08 09 10 11 12
Name ...
A a b c d e f a b c d ... c d e f g h i j k l
B a b c d e f a b c d ... c d e f g h i j k l
print (df.columns)
MultiIndex(levels=[['2012', '2013', '2014'], ['01', '02', '03', '04', '05', '06', '07', '08', '09', '10', '11', '12']],
labels=[[0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2], [6, 7, 8, 9, 10, 11, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]],
names=['Date', 'Month'])
#reshape
df2 = df.unstack().reset_index(name='Value')
df2['Date'] = df2['Month'] + '/' + df2['Date']
df2 = df2.drop('Month', axis=1)
print (df2)
Date Name Value
0 07/2012 A a
1 07/2012 B a
2 08/2012 A b
3 08/2012 B b
4 09/2012 A c
5 09/2012 B c
6 10/2012 A d
7 10/2012 B d
8 11/2012 A e
9 11/2012 B e
10 12/2012 A f
11 12/2012 B f
如果可以從文件讀取df
,則將參數header=[0,1]
添加到用於將第一行和第二行讀取到MultiIndex
並將第一列Name
讀取到index
。 然后解決方案有所改變:
df = pd.read_csv('filename', header=[0,1], index_col=[0])
idx = pd.Series(df.columns.get_level_values(0))
df.columns = pd.MultiIndex.from_arrays([idx.mask(idx.str.contains('Unnamed:')).ffill(),
df.columns.get_level_values(1)],
names=('Date','Month'))
print (df)
Date 2012 2013 ... 2014
Month 07 08 09 10 11 12 01 02 03 04 ... 03 04 05 06 07 08 09 10 11 12
Name ...
A a b c d e f a b c d ... c d e f g h i j k l
B a b c d e f a b c d ... c d e f g h i j k l
print (df.columns)
MultiIndex(levels=[['2012', '2013', '2014'], ['01', '02', '03', '04', '05', '06', '07', '08', '09', '10', '11', '12']],
labels=[[0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2], [6, 7, 8, 9, 10, 11, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]],
names=['Date', 'Month'])
#reshape
df2 = df.unstack().reset_index(name='Value').rename(columns={'level_2':'Name'})
df2['Date'] = df2['Month'].astype(str) + '/' + df2['Date'].astype(str)
#df2['Date'] = pd.to_datetime(df2['Date'].radd('1/'), format='%d/%m/%y')
df2 = df2.drop('Month', axis=1)
print (df2)
Date Name Value
0 07/2012 A a
1 07/2012 B a
2 08/2012 A b
3 08/2012 B b
4 09/2012 A c
5 09/2012 B c
6 10/2012 A d
7 10/2012 B d
8 11/2012 A e
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.