简体   繁体   English

在熊猫数据框中设置两列作为索引以进行时间序列分析

[英]set two columns as the index in a pandas dataframe for time series analysis

In the case of weather or stock market data, temperatures and stock prices are both measured at multiple stations or stock tickers for any given date. 对于天气或股票市场数据,在任何给定日期都在多个站点或股票行情指示器中测量温度和股票价格。

Therefore what is the most effective way to set an index which contains two fields? 因此,设置包含两个字段的索引的最有效方法是什么?

For weather: the weather_station and then Date 对于天气:weather_station,然后单击日期

For Stock Data: the stock_code and then Date 对于库存数据:stock_code然后是Date

Setting the index in this way would allow filtering such as: 以这种方式设置索引将允许过滤,例如:

  • stock_df["code"]["start_date":"end_date"]
  • weather_df["station"]["start_date":"end_date"]

As mentioned by Anton you need to use MultiIndex as follows: 如Anton所述,您需要按以下方式使用MultiIndex:

stock_df.index = pd.MultiIndex.from_arrays(stock_df[['code', 'date']].values.T, names=['idx1', 'idx2'])

weather_df.index = pd.MultiIndex.from_arrays(weather_df[['station', 'date']].values.T, names=['idx1', 'idx2'])

That functionality currently exists. 该功能当前存在。 Please refer to the documentation for more examples. 请参考文档以获取更多示例。

stock_df = pd.DataFrame({'symbol': ['AAPL', 'AAPL', 'F', 'F', 'F'], 
                         'date': ['2016-1-1', '2016-1-2', '2016-1-1', '2016-1-2', '2016-1-3'], 
                         'price': [100., 101, 50, 47.5, 49]}).set_index(['symbol', 'date'])

>>> stock_df
                 price
symbol date           
AAPL   2016-1-1  100.0
       2016-1-2  101.0
F      2016-1-1   50.0
       2016-1-2   47.5
       2016-1-3   49.0

>>> stock_df.loc['AAPL']
          price
date           
2016-1-1    100
2016-1-2    101

>>> stock_df.loc['AAPL', '2016-1-2']
price    101
Name: (AAPL, 2016-1-2), dtype: float64

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM