具有行和列的多索引数据框

Question

I'm having a hard time splitting a data frame.我很难拆分数据框。 I am hoping to get some help.我希望得到一些帮助。 I'm trying to split the original data into a data-frame for each city indexed in the top row and the date in the first column.我正在尝试将原始数据拆分为每个城市的数据框，这些城市在第一行中索引，日期在第一列中。 For my actual data I have 189 unique cities对于我的实际数据，我有 189 个独特的城市

Original data:原始数据：

This is my goal:这是我的目标：

I've tried a number of different ways but my index's are still in the first two columns.我尝试了许多不同的方法，但我的索引仍然在前两列中。

Answer 1

This can be done using df.pivot() , df.reorder_levels() and df.sort_index() .这可以使用df.pivot() 、 df.reorder_levels()和df.sort_index() 。

df.pivot() : transpose the table into hierarchical columns df.pivot() ：将表转置为分层列
- axis=1 refers to columns while axis=0 refers to rows. axis=1表示列， axis=0表示行。
df.reorder_levels() : move City up and Vals down df.reorder_levels() : 向上移动 City 和向下移动 Vals
df.sort_index() : sort the rows and columns using default or customized ordering (eg sort as datetime rather than str ). df.sort_index() ：使用默认或自定义排序（例如按datetime而不是str排序）对行和列进行排序。

Code :代码：

import pandas as pd
import numpy as np

df = pd.DataFrame(
    data={  # please provide sample data next time
        "City": ["NYC"]*5 + ["LA"]*5 + ["OKC"]*5,
        "Date": ["6/1/1998", "7/1/1998", "8/1/1998", "9/1/1998", "10/1/1998"]*3,
        "Val1": np.array(range(15))*10,
        "Val2": np.array(range(15))/10,
        "Val3": np.array(range(15)),
    }
)

df_out = df.pivot(index="Date", columns=["City"], values=["Val1", "Val2", "Val3"])\
    .reorder_levels([1, 0], axis=1)\
    .sort_index(axis=1)\
    .sort_index(axis=0, key=lambda s: pd.to_datetime(s))

Output :输出：

In[27]: df_out
Out[27]: 

City         LA             NYC              OKC           
           Val1 Val2 Val3  Val1 Val2 Val3   Val1 Val2  Val3
Date                                                       
6/1/1998   50.0  0.5  5.0   0.0  0.0  0.0  100.0  1.0  10.0
7/1/1998   60.0  0.6  6.0  10.0  0.1  1.0  110.0  1.1  11.0
8/1/1998   70.0  0.7  7.0  20.0  0.2  2.0  120.0  1.2  12.0
9/1/1998   80.0  0.8  8.0  30.0  0.3  3.0  130.0  1.3  13.0
10/1/1998  90.0  0.9  9.0  40.0  0.4  4.0  140.0  1.4  14.0

NB If you want the "City" label on the top-left side to be removed, just set df_out.columns.names directly: NB如果你想去掉左上角的“City”标签，直接设置df_out.columns.names即可：

df_out.columns.names=[None, None]

Answer 2

    import pandas as pd

    # create an example dataframe
    df = pd.DataFrame(
       {'date':[1990, 2000, 2010, 2020, 1990, 2000, 2010, 2020],
       'val1': [0,1,2,3, 10,11,12,13], 
       'val2':[5,6,7,8, 50,60,70,80],
       'city':['NYC', 'NYC',  'NYC', 'NYC', 'LA', 'LA','LA', 'LA']})
    # make a pivot table with multi-index
    df2  = df.pivot(index='date', columns='city')
    # reorder the multiindex as your desired output
    df2.columns = df2.columns.swaplevel(0, 1)
    df2.sort_index(axis=1, level=0, inplace=True)
    # print the dataframe
    df2

Output:输出：

具有行和列的多索引数据框

问题描述

2 个解决方案

解决方案1
3 已采纳 2020-10-03 22:16:05

解决方案2
0 2020-10-03 22:54:00

具有行和列的多索引数据框

问题描述

2 个解决方案

解决方案1 3 已采纳 2020-10-03 22:16:05

解决方案2 0 2020-10-03 22:54:00

解决方案1
3 已采纳 2020-10-03 22:16:05

解决方案2
0 2020-10-03 22:54:00