[英]MultiIndex data frame with rows and columns
I'm having a hard time splitting a data frame.我很难拆分数据框。 I am hoping to get some help.
我希望得到一些帮助。 I'm trying to split the original data into a data-frame for each city indexed in the top row and the date in the first column.
我正在尝试将原始数据拆分为每个城市的数据框,这些城市在第一行中索引,日期在第一列中。 For my actual data I have 189 unique cities
对于我的实际数据,我有 189 个独特的城市
Original data:原始数据:
This is my goal:这是我的目标:
I've tried a number of different ways but my index's are still in the first two columns.我尝试了许多不同的方法,但我的索引仍然在前两列中。
This can be done using df.pivot()
, df.reorder_levels()
and df.sort_index()
.这可以使用
df.pivot()
、 df.reorder_levels()
和df.sort_index()
。
df.pivot()
: transpose the table into hierarchical columns df.pivot()
:将表转置为分层列
axis=1
refers to columns while axis=0
refers to rows. axis=1
表示列, axis=0
表示行。df.reorder_levels()
: move City up and Vals down df.reorder_levels()
: 向上移动 City 和向下移动 Valsdf.sort_index()
: sort the rows and columns using default or customized ordering (eg sort as datetime
rather than str
). df.sort_index()
:使用默认或自定义排序(例如按datetime
而不是str
排序)对行和列进行排序。 Code :代码:
import pandas as pd
import numpy as np
df = pd.DataFrame(
data={ # please provide sample data next time
"City": ["NYC"]*5 + ["LA"]*5 + ["OKC"]*5,
"Date": ["6/1/1998", "7/1/1998", "8/1/1998", "9/1/1998", "10/1/1998"]*3,
"Val1": np.array(range(15))*10,
"Val2": np.array(range(15))/10,
"Val3": np.array(range(15)),
}
)
df_out = df.pivot(index="Date", columns=["City"], values=["Val1", "Val2", "Val3"])\
.reorder_levels([1, 0], axis=1)\
.sort_index(axis=1)\
.sort_index(axis=0, key=lambda s: pd.to_datetime(s))
Output :输出:
In[27]: df_out
Out[27]:
City LA NYC OKC
Val1 Val2 Val3 Val1 Val2 Val3 Val1 Val2 Val3
Date
6/1/1998 50.0 0.5 5.0 0.0 0.0 0.0 100.0 1.0 10.0
7/1/1998 60.0 0.6 6.0 10.0 0.1 1.0 110.0 1.1 11.0
8/1/1998 70.0 0.7 7.0 20.0 0.2 2.0 120.0 1.2 12.0
9/1/1998 80.0 0.8 8.0 30.0 0.3 3.0 130.0 1.3 13.0
10/1/1998 90.0 0.9 9.0 40.0 0.4 4.0 140.0 1.4 14.0
NB If you want the "City" label on the top-left side to be removed, just set df_out.columns.names
directly: NB如果你想去掉左上角的“City”标签,直接设置
df_out.columns.names
即可:
df_out.columns.names=[None, None]
import pandas as pd
# create an example dataframe
df = pd.DataFrame(
{'date':[1990, 2000, 2010, 2020, 1990, 2000, 2010, 2020],
'val1': [0,1,2,3, 10,11,12,13],
'val2':[5,6,7,8, 50,60,70,80],
'city':['NYC', 'NYC', 'NYC', 'NYC', 'LA', 'LA','LA', 'LA']})
# make a pivot table with multi-index
df2 = df.pivot(index='date', columns='city')
# reorder the multiindex as your desired output
df2.columns = df2.columns.swaplevel(0, 1)
df2.sort_index(axis=1, level=0, inplace=True)
# print the dataframe
df2
Output:输出:
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.