簡體   English   中英

更改熊貓中數據框的堆疊

[英]Change stacking of dataframe in pandas

我有一個看起來像這樣的數據框。

Name          2012                  2013                       2014  
         7 8 9 10 11 12   1 2 3 4 5 6 7 8 9 10 11 12  1 2 3 4 5 6 7 8 9 10 11 12
 A       a b c d e f g    a b c d e f g h i j k l m   a b c d e f g h i j k l m
 B       a b c d e f g    a b c d e f g h i j k l m   a b c d e f g h i j k l m

等等。 2012、2013、2014代表年份,下面分別是年份,而a,b,c,d,e ...代表各個月份中NAME的值,即A,B..。 每個名稱的a,b,c,d,e ...是不同的,此處僅出於表示目的而顯示。

目前,我已執行以下操作:

workbook = pd.ExcelFile('XYZ.xlsx')
df = workbook.parse(sheetname='Page1-2')
df2 = pd.melt(df, id_vars=["Name"], 
              var_name="Date", value_name="Value")

即我在df中導入了XYZ.xlsx文件。 使用pd.melt將df排序為df2。 df2的輸出如下所示:

Name Date      Value
 A   2012      a
 A   Unnamed   b
 A   Unnamed   c
 A   Unnamed   d
 A   Unnamed   e
 A   Unnamed   f
 A   Unnamed   g
 A   2013      a
 A   Unnamed   b
 A   Unnamed   c
 A   Unnamed   d
 A   Unnamed   e

以及其他年份和名稱。 我希望我的日期列顯示如下內容:

 Date
 7/2012
 8/2012
 9/2012
 10/2012
 11/2012
 12/2012
 1/2013
 2/2013
 3/2013
 4/2013
 5/2013
 6/2013
 7/2013
 8/2013

根據初始數據框中提到的月份和年份。 我不確定該怎么做。 任何幫助深表感謝!

打印示例數據庫的(df.to_dict())

{'Name': {0: nan, 1: 'A', 2: 'B'}, 2012: {0: '07', 1: 'a', 2: 'a'},'Unnamed: 2': {0: '08', 1: 'b', 2: 'b'}, 'Unnamed: 3': {0: '09', 1: 'c', 2: 'c'}, 'Unnamed: 4': {0: '10', 1: 'd', 2: 'd'}, 'Unnamed: 5': {0: '11', 1: 'e', 2: 'e'}, 'Unnamed: 6': {0: '12', 1: 'f', 2: 'f'}, '2013': {0: '01', 1: 'a', 2: 'a'}, 'Unnamed: 8': {0: '02', 1: 'b', 2: 'b'}, 'Unnamed: 9': {0: '03', 1: 'c', 2: 'c'}, 'Unnamed: 10': {0: '04', 1: 'd', 2: 'd'}, 'Unnamed: 11': {0: '05', 1: 'e', 2: 'e'}, 'Unnamed: 12': {0: '06', 1: 'f', 2: 'f'}, 'Unnamed: 13': {0: '07', 1: 'a', 2: 'a'}, 'Unnamed: 14': {0: '08', 1: 'b', 2: 'b'}, 'Unnamed: 15': {0: '09', 1: 'c', 2: 'c'}, 'Unnamed: 16': {0: '10', 1: 'd', 2: 'd'}, 'Unnamed: 17': {0: '11', 1: 'e', 2: 'e'}, 'Unnamed: 18': {0: '12', 1: 'f', 2: 'f'}, '2014': {0: '01', 1: 'a', 2: 'a'}, 'Unnamed: 20': {0: '02', 1: 'b', 2: 'b'}, 'Unnamed: 21': {0: '03', 1: 'c', 2: 'c'}, 'Unnamed: 22': {0: '04', 1: 'd', 2: 'd'}, 'Unnamed: 23': {0: '05', 1: 'e', 2: 'e'}, 'Unnamed: 24': {0: '06', 1: 'f', 2: 'f'}, 'Unnamed: 25': {0: '07', 1: 'a', 2: 'a'}, 'Unnamed: 26': {0: '08', 1: 'b', 2: 'b'}, 'Unnamed: 27': {0: '09', 1: 'c', 2: 'c'}, 'Unnamed: 28': {0: '10', 1: 'd', 2: 'd'}, 'Unnamed: 29': {0: '11', 1: 'e', 2: 'e'}, 'Unnamed: 30': {0: '12', 1: 'f', 2: 'f'}} 

采用:

#create index with column Name
df = df.set_index('Name')
#create Multiindex with columns (add instead Unammed categories) and first row
idx = pd.Series(df.columns)
df.columns = pd.MultiIndex.from_arrays([idx.mask(idx.str.contains('Unnamed:')).ffill(),
              df.iloc[0]], names=('Date','Month'))
#remove first row
df = df.iloc[1:]


print (df)
Date  2012                2013          ... 2014                           
Month   07 08 09 10 11 12   01 02 03 04 ...   03 04 05 06 07 08 09 10 11 12
Name                                    ...                                
A        a  b  c  d  e  f    a  b  c  d ...    c  d  e  f  g  h  i  j  k  l
B        a  b  c  d  e  f    a  b  c  d ...    c  d  e  f  g  h  i  j  k  l

print (df.columns)
MultiIndex(levels=[['2012', '2013', '2014'], ['01', '02', '03', '04', '05', '06', '07', '08', '09', '10', '11', '12']],
           labels=[[0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2], [6, 7, 8, 9, 10, 11, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]],
           names=['Date', 'Month'])

#reshape
df2 = df.unstack().reset_index(name='Value')
df2['Date'] = df2['Month'] + '/' + df2['Date']
df2 = df2.drop('Month', axis=1)
print (df2)
       Date Name Value
0   07/2012    A     a
1   07/2012    B     a
2   08/2012    A     b
3   08/2012    B     b
4   09/2012    A     c
5   09/2012    B     c
6   10/2012    A     d
7   10/2012    B     d
8   11/2012    A     e
9   11/2012    B     e
10  12/2012    A     f
11  12/2012    B     f

如果可以從文件讀取df ,則將參數header=[0,1]添加到用於將第一行和第二行讀取到MultiIndex並將第一列Name讀取到index 然后解決方案有所改變:

df = pd.read_csv('filename', header=[0,1], index_col=[0])


idx = pd.Series(df.columns.get_level_values(0))

df.columns = pd.MultiIndex.from_arrays([idx.mask(idx.str.contains('Unnamed:')).ffill(),
                                       df.columns.get_level_values(1)], 
                                       names=('Date','Month'))
print (df)
Date  2012                2013          ... 2014                           
Month   07 08 09 10 11 12   01 02 03 04 ...   03 04 05 06 07 08 09 10 11 12
Name                                    ...                                
A        a  b  c  d  e  f    a  b  c  d ...    c  d  e  f  g  h  i  j  k  l
B        a  b  c  d  e  f    a  b  c  d ...    c  d  e  f  g  h  i  j  k  l

print (df.columns)
MultiIndex(levels=[['2012', '2013', '2014'], ['01', '02', '03', '04', '05', '06', '07', '08', '09', '10', '11', '12']],
           labels=[[0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2], [6, 7, 8, 9, 10, 11, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]],
           names=['Date', 'Month'])

#reshape
df2 = df.unstack().reset_index(name='Value').rename(columns={'level_2':'Name'})
df2['Date'] = df2['Month'].astype(str) + '/' + df2['Date'].astype(str)
#df2['Date'] = pd.to_datetime(df2['Date'].radd('1/'), format='%d/%m/%y')
df2 = df2.drop('Month', axis=1)
print (df2)

       Date Name Value
0   07/2012    A     a
1   07/2012    B     a
2   08/2012    A     b
3   08/2012    B     b
4   09/2012    A     c
5   09/2012    B     c
6   10/2012    A     d
7   10/2012    B     d
8   11/2012    A     e 

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM