简体   繁体   English

使用MultiIndex操纵熊猫数据框

[英]Manipulating Pandas Dataframe with MultiIndex

I have a pandas DataFrame formatted as such: 我有一个熊猫DataFrame格式化为:

  mesh 1          energy low [eV] energy high [eV] nuclide score     mean  
           x    y   z                                                           
0          1    1   1        1.00e-03         2.00e+07   total  flux 0.00e+00   
1          1    1   2        1.00e-03         2.00e+07   total  flux 1.82e-03   
2          1    1   3        1.00e-03         2.00e+07   total  flux 6.96e-03   
3          1    1   4        1.00e-03         2.00e+07   total  flux 1.47e-03   
4          1    1   5        1.00e-03         2.00e+07   total  flux 6.93e-03   
5          1    1   6        1.00e-03         2.00e+07   total  flux 8.73e-03   
6          1    1   7        1.00e-03         2.00e+07   total  flux 1.34e-02   
7          1    1   8        1.00e-03         2.00e+07   total  flux 1.16e-02   
8          1    1   9        1.00e-03         2.00e+07   total  flux 4.14e-03   
9          1    1  10        1.00e-03         2.00e+07   total  flux 5.26e-03   
10         1    2   1        1.00e-03         2.00e+07   total  flux 6.16e-03   
11         1    2   2        1.00e-03         2.00e+07   total  flux 1.76e-02   
12         1    2   3        1.00e-03         2.00e+07   total  flux 1.80e-02   
13         1    2   4        1.00e-03         2.00e+07   total  flux 1.97e-02   
14         1    2   5        1.00e-03         2.00e+07   total  flux 1.76e-02   
15         1    2   6        1.00e-03         2.00e+07   total  flux 1.90e-02   
16         1    2   7        1.00e-03         2.00e+07   total  flux 3.53e-02   
17         1    2   8        1.00e-03         2.00e+07   total  flux 0.00e+00   
18         1    2   9        1.00e-03         2.00e+07   total  flux 0.00e+00   
19         1    2  10        1.00e-03         2.00e+07   total  flux 0.00e+00   
20         1    3   1        1.00e-03         2.00e+07   total  flux 0.00e+00   
21         1    3   2        1.00e-03         2.00e+07   total  flux 0.00e+00   
22         1    3   3        1.00e-03         2.00e+07   total  flux 0.00e+00   
23         1    3   4        1.00e-03         2.00e+07   total  flux 0.00e+00   
24         1    3   5        1.00e-03         2.00e+07   total  flux 0.00e+00   
25         1    3   6        1.00e-03         2.00e+07   total  flux 0.00e+00   
26         1    3   7        1.00e-03         2.00e+07   total  flux 0.00e+00   
27         1    3   8        1.00e-03         2.00e+07   total  flux 0.00e+00   
28         1    3   9        1.00e-03         2.00e+07   total  flux 0.00e+00   
29         1    3  10        1.00e-03         2.00e+07   total  flux 0.00e+00   
...      ...  ...  ..             ...              ...     ...   ...      ...   
99970    100   98   1        1.00e-03         2.00e+07   total  flux 0.00e+00   
99971    100   98   2        1.00e-03         2.00e+07   total  flux 0.00e+00   
99972    100   98   3        1.00e-03         2.00e+07   total  flux 0.00e+00   
99973    100   98   4        1.00e-03         2.00e+07   total  flux 0.00e+00   
99974    100   98   5        1.00e-03         2.00e+07   total  flux 0.00e+00   
99975    100   98   6        1.00e-03         2.00e+07   total  flux 0.00e+00   
99976    100   98   7        1.00e-03         2.00e+07   total  flux 0.00e+00   
99977    100   98   8        1.00e-03         2.00e+07   total  flux 0.00e+00   
99978    100   98   9        1.00e-03         2.00e+07   total  flux 0.00e+00   
99979    100   98  10        1.00e-03         2.00e+07   total  flux 0.00e+00   
99980    100   99   1        1.00e-03         2.00e+07   total  flux 0.00e+00   
99981    100   99   2        1.00e-03         2.00e+07   total  flux 0.00e+00   
99982    100   99   3        1.00e-03         2.00e+07   total  flux 0.00e+00   
99983    100   99   4        1.00e-03         2.00e+07   total  flux 0.00e+00   
99984    100   99   5        1.00e-03         2.00e+07   total  flux 0.00e+00   
99985    100   99   6        1.00e-03         2.00e+07   total  flux 0.00e+00   
99986    100   99   7        1.00e-03         2.00e+07   total  flux 0.00e+00   
99987    100   99   8        1.00e-03         2.00e+07   total  flux 0.00e+00   
99988    100   99   9        1.00e-03         2.00e+07   total  flux 0.00e+00   
99989    100   99  10        1.00e-03         2.00e+07   total  flux 0.00e+00   
99990    100  100   1        1.00e-03         2.00e+07   total  flux 0.00e+00   
99991    100  100   2        1.00e-03         2.00e+07   total  flux 0.00e+00   
99992    100  100   3        1.00e-03         2.00e+07   total  flux 0.00e+00   
99993    100  100   4        1.00e-03         2.00e+07   total  flux 0.00e+00   
99994    100  100   5        1.00e-03         2.00e+07   total  flux 0.00e+00   
99995    100  100   6        1.00e-03         2.00e+07   total  flux 0.00e+00   
99996    100  100   7        1.00e-03         2.00e+07   total  flux 0.00e+00   
99997    100  100   8        1.00e-03         2.00e+07   total  flux 0.00e+00   
99998    100  100   9        1.00e-03         2.00e+07   total  flux 0.00e+00   
99999    100  100  10        1.00e-03         2.00e+07   total  flux 0.00e+00   

RangeIndex(start=0, stop=100000, step=1)
MultiIndex(levels=[['energy high [eV]', 'energy low [eV]', 'mean', 'mesh 1', 'nuclide', 'score', 'std. dev.'], ['', 'x', 'y', 'z']],
           labels=[[3, 3, 3, 1, 0, 4, 5, 2, 6], [1, 2, 3, 0, 0, 0, 0, 0, 0]])

I would like to have 10 pandas dataframes (since 'mesh 1', 'z' goes to 10) in a list where in each dataframe the rows are ('mesh 1', 'y'), the columns are ('mesh 1', 'x') and the values are 'mean'. 我想在一个列表中有10个熊猫数据框(因为'mesh 1','z'变为10),其中每个数据框中的行是('mesh 1','y'),列是('mesh 1 ','x'),其值为'mean'。 I have figured out how to get the 10 dataframes in a list: 我已经弄清楚了如何获取列表中的10个数据帧:

axial_dfs = []
    for i in range(10):
        temp_df = flux_df[flux_df['mesh 1']['z'] == i]
        axial_dfs.append(temp_df)

But I can't figure out how to change the rows and columns. 但是我不知道如何更改行和列。 I would try pivot but I don't know how with the MultiIndex for 'mesh 1'. 我会尝试进行数据透视,但是我不知道如何使用MultiIndex来实现“网格1”。

Appreciate all the help! 感谢所有帮助! Thanks! 谢谢!

I'm a little confused about what you need but I think merging the column levels together in your temp_df will help you: 我对您的需求有些困惑,但我认为将列级别合并到temp_df将对您temp_df帮助:

axial_dfs = []
    for i in range(10):
        temp_df = flux_df[flux_df['mesh 1']['z'] == i]
        temp_df.columns = temp_df.columns.map('_'.join)  # add this line
        axial_dfs.append(temp_df)

Now, all of the frames in axial_dfs will have one level of columns (eg mesh 1_x or mesh 1_y ), which it sounds like you're comfortable manipulating on your own (using pandas.DataFrame.pivot_table or pandas.DataFrame.groupby ). 现在, axial_dfs所有框架axial_dfs将具有一层列(例如, mesh 1_x mesh 1_ymesh 1_y ),这听起来像您很愿意自己操作(使用pandas.DataFrame.pivot_tablepandas.DataFrame.groupby )。

In the following example, I use unstack to turn the second index level into a column index. 在下面的示例中,我使用unstack将第二个索引级别转换为列索引。 Then, I use a list comprehension to split the result into a list determined by the levels of the first index. 然后,我使用列表推导将结果分成由第一个索引的级别确定的列表。

import pandas as pd
import numpy as np

# Create simple example
data = np.random.randint(8, size=(8, 2))
levels = [['df1', 'df2'], ['a', 'b'], [1, 2]]
idx = pd.MultiIndex.from_product(levels, names=['first', 'second', 'third'])
df = pd.DataFrame(data, index=idx, columns=['col1', 'col2'])

# Step 1: unstack to get second level as column index
df = df.unstack(level='second')['col2']

# Step 2: get a list of chunks of df by first index level
first_unique = df.index.get_level_values('first').unique()
df_ls = [df.loc[x] for x in first_unique]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM