简体   繁体   English

如何操作MultiIndex熊猫系列?

[英]How to manipulate MultiIndex pandas series?

I need to extract data from multiple sites. 我需要从多个站点提取数据。

Firstly read file 首先读取文件

dfs = pd.read_excel('Consumption Report.xlsx', sheet_name='Elec Monthly Cons', header=[0,1], index_col=[0,1])

My Jupyter image 我的Jupyter图片 在此处输入图片说明

What I have tried so far: 到目前为止我尝试过的是:

dfs.iloc[0]

Output: 输出:

Site        Profile 
2014-01-01  JAN 2014    10344.0
2014-02-01  FEB 2014        NaN
2014-03-01  MAR 2014        NaN
2014-04-01  APR 2014    16745.0
2014-05-01  MAY 2014        NaN
2014-06-01  JUN 2014        NaN
2014-07-01  JUL 2014     9284.0
2014-08-01  AUG 2014        NaN
2014-09-01  SEP 2014     9235.7
2014-10-01  OCT 2014        NaN
2014-11-01  NOV 2014     9966.0
2014-12-01  DEC 2014        NaN
2015-01-01  JAN 2015        NaN
2015-02-01  FEB 2015    14616.0
2015-03-01  MAR 2015        NaN
2015-04-01  APR 2015        NaN
2015-05-01  MAY 2015    15404.0

How to extract values from the last column? 如何从最后一列中提取值?

This is the index 这是指数

MultiIndex(levels=[[2014-01-01 00:00:00, 2014-02-01 00:00:00, 2014-03-01 00:00:00, 2014-04-01 00:00:00, 2014-05-01 00:00:00, 2014-06-01 00:00:00, 2014-07-01 00:00:00, 2014-08-01 00:00:00, 2014-09-01 00:00:00, 2014-10-01 00:00:00, 2014-11-01 00:00:00, 2014-12-01 00:00:00, 2015-01-01 00:00:00, 2015-02-01 00:00:00, 2015-03-01 00:00:00, 2015-04-01 00:00:00, 2015-05-01 00:00:00, 2015-06-01 00:00:00, 2015-07-01 00:00:00, 2015-08-01 00:00:00, 2015-09-01 00:00:00, 2015-10-01 00:00:00, 2015-11-01 00:00:00, 2015-12-01 00:00:00, 2016-01-01 00:00:00, 2016-02-01 00:00:00, 2016-03-01 00:00:00, 2016-04-01 00:00:00, 2016-05-01 00:00:00, 2016-06-01 00:00:00, 2016-07-01 00:00:00, 2016-08-01 00:00:00, 2016-09-01 00:00:00, 2016-10-01 00:00:00, 2016-11-01 00:00:00, 2016-12-01 00:00:00, 2017-01-01 00:00:00, 2017-02-01 00:00:00, 2017-03-01 00:00:00, 2017-04-01 00:00:00, 2017-05-01 00:00:00, 2017-06-01 00:00:00, 2017-07-01 00:00:00, 2017-08-01 00:00:00, 2017-09-01 00:00:00, 2017-10-01 00:00:00, 2017-11-01 00:00:00, 2017-12-01 00:00:00], ['APR 2014', 'APR 2015', 'APR 2016', 'APR 2017', 'AUG 2014', 'AUG 2015', 'AUG 2016', 'AUG 2017', 'DEC 2014', 'DEC 2015', 'DEC 2016', 'DEC 2017', 'FEB 2014', 'FEB 2015', 'FEB 2016', 'FEB 2017', 'JAN 2014', 'JAN 2015', 'JAN 2016', 'JAN 2017', 'JUL 2014', 'JUL 2015', 'JUL 2016', 'JUL 2017', 'JUN 2014', 'JUN 2015', 'JUN 2016', 'JUN 2017', 'MAR 2014', 'MAR 2015', 'MAR 2016', 'MAR 2017', 'MAY 2014', 'MAY 2015', 'MAY 2016', 'MAY 2017', 'NOV 2014', 'NOV 2015', 'NOV 2016', 'NOV 2017', 'OCT 2014', 'OCT 2015', 'OCT 2016', 'OCT 2017', 'SEP 2014', 'SEP 2015', 'SEP 2016', 'SEP 2017']],
           labels=[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47], [16, 12, 28, 0, 32, 24, 20, 4, 44, 40, 36, 8, 17, 13, 29, 1, 33, 25, 21, 5, 45, 41, 37, 9, 18, 14, 30, 2, 34, 26, 22, 6, 46, 42, 38, 10, 19, 15, 31, 3, 35, 27, 23, 7, 47, 43, 39, 11]],
           names=['Site', 'Profile'])

If I go for what Evan suggested 如果我按照埃文的建议去做

df.index.get_level_values(level=-1)

Output 输出量

Index(['JAN 2014', 'FEB 2014', 'MAR 2014', 'APR 2014', 'MAY 2014', 'JUN 2014',
       'JUL 2014', 'AUG 2014', 'SEP 2014', 'OCT 2014', 'NOV 2014', 'DEC 2014',
       'JAN 2015', 'FEB 2015', 'MAR 2015', 'APR 2015', 'MAY 2015', 'JUN 2015',
       'JUL 2015', 'AUG 2015', 'SEP 2015', 'OCT 2015', 'NOV 2015', 'DEC 2015',
       'JAN 2016', 'FEB 2016', 'MAR 2016', 'APR 2016', 'MAY 2016', 'JUN 2016',
       'JUL 2016', 'AUG 2016', 'SEP 2016', 'OCT 2016', 'NOV 2016', 'DEC 2016',
       'JAN 2017', 'FEB 2017', 'MAR 2017', 'APR 2017', 'MAY 2017', 'JUN 2017',
       'JUL 2017', 'AUG 2017', 'SEP 2017', 'OCT 2017', 'NOV 2017', 'DEC 2017'],
      dtype='object', name='Profile')

Zero level 零位

df.index.get_level_values(level=0)

DatetimeIndex(['2014-01-01', '2014-02-01', '2014-03-01', '2014-04-01',
               '2014-05-01', '2014-06-01', '2014-07-01', '2014-08-01',
               '2014-09-01', '2014-10-01', '2014-11-01', '2014-12-01',
               '2015-01-01', '2015-02-01', '2015-03-01', '2015-04-01',
               '2015-05-01', '2015-06-01', '2015-07-01', '2015-08-01',
               '2015-09-01', '2015-10-01', '2015-11-01', '2015-12-01',
               '2016-01-01', '2016-02-01', '2016-03-01', '2016-04-01',
               '2016-05-01', '2016-06-01', '2016-07-01', '2016-08-01',
               '2016-09-01', '2016-10-01', '2016-11-01', '2016-12-01',
               '2017-01-01', '2017-02-01', '2017-03-01', '2017-04-01',
               '2017-05-01', '2017-06-01', '2017-07-01', '2017-08-01',
               '2017-09-01', '2017-10-01', '2017-11-01', '2017-12-01'],
              dtype='datetime64[ns]', name='Site', freq=None)

How to get values from non-index column? 如何从非索引列获取值?

File uploaded 文件上传

https://ufile.io/m5nbc https://ufile.io/m5nbc

Given a dataframe: 给定一个数据框:

"""
IndexID IndexDateTime IndexAttribute ColumnA ColumnB
   1      2015-02-05        8           A       B
   1      2015-02-05        7           C       D
   1      2015-02-10        7           X       Y
"""

import pandas as pd
import numpy as np

df = pd.read_clipboard(parse_dates=["IndexDateTime"]).set_index(["IndexID", "IndexDateTime", "IndexAttribute"])
df

Output: 输出:

                                     ColumnA ColumnB
IndexID IndexDateTime IndexAttribute                
1       2015-02-05    8                    A       B
                      7                    C       D
        2015-02-10    7                    X       Y

The values of the last column( ColumnB ) can be accessed via df.loc[:, "ColumnB"].values , or df.loc[:, "ColumnB"] . 可以通过df.loc[:, "ColumnB"].valuesdf.loc[:, "ColumnB"]访问最后一列( ColumnB )的df.loc[:, "ColumnB"].values See: https://pandas.pydata.org/pandas-docs/stable/indexing.html 参见: https : //pandas.pydata.org/pandas-docs/stable/indexing.html

IndexID  IndexDateTime  IndexAttribute
1        2015-02-05     8                 B
                        7                 D
         2015-02-10     7                 Y
Name: ColumnB, dtype: object

The first argument to df.loc[rows, columns] or df.iloc[rows, columns] refers to the rows or columns to slice, respectively. df.loc[rows, columns]df.iloc[rows, columns]的第一个参数分别df.loc[rows, columns]切片的行或列。

To get the values from the index: 要从索引中获取值:

df.index.get_level_values(level=-1)
df.index.get_level_values(level="IndexAttribute")

Both return: 两者都返回:

Int64Index([8, 7, 7], dtype='int64', name='IndexAttribute')

Is that what you had in mind? 那是你的想法吗?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM