[英]Manipulating Pandas Dataframe with MultiIndex
I have a pandas DataFrame formatted as such: 我有一个熊猫DataFrame格式化为:
mesh 1 energy low [eV] energy high [eV] nuclide score mean
x y z
0 1 1 1 1.00e-03 2.00e+07 total flux 0.00e+00
1 1 1 2 1.00e-03 2.00e+07 total flux 1.82e-03
2 1 1 3 1.00e-03 2.00e+07 total flux 6.96e-03
3 1 1 4 1.00e-03 2.00e+07 total flux 1.47e-03
4 1 1 5 1.00e-03 2.00e+07 total flux 6.93e-03
5 1 1 6 1.00e-03 2.00e+07 total flux 8.73e-03
6 1 1 7 1.00e-03 2.00e+07 total flux 1.34e-02
7 1 1 8 1.00e-03 2.00e+07 total flux 1.16e-02
8 1 1 9 1.00e-03 2.00e+07 total flux 4.14e-03
9 1 1 10 1.00e-03 2.00e+07 total flux 5.26e-03
10 1 2 1 1.00e-03 2.00e+07 total flux 6.16e-03
11 1 2 2 1.00e-03 2.00e+07 total flux 1.76e-02
12 1 2 3 1.00e-03 2.00e+07 total flux 1.80e-02
13 1 2 4 1.00e-03 2.00e+07 total flux 1.97e-02
14 1 2 5 1.00e-03 2.00e+07 total flux 1.76e-02
15 1 2 6 1.00e-03 2.00e+07 total flux 1.90e-02
16 1 2 7 1.00e-03 2.00e+07 total flux 3.53e-02
17 1 2 8 1.00e-03 2.00e+07 total flux 0.00e+00
18 1 2 9 1.00e-03 2.00e+07 total flux 0.00e+00
19 1 2 10 1.00e-03 2.00e+07 total flux 0.00e+00
20 1 3 1 1.00e-03 2.00e+07 total flux 0.00e+00
21 1 3 2 1.00e-03 2.00e+07 total flux 0.00e+00
22 1 3 3 1.00e-03 2.00e+07 total flux 0.00e+00
23 1 3 4 1.00e-03 2.00e+07 total flux 0.00e+00
24 1 3 5 1.00e-03 2.00e+07 total flux 0.00e+00
25 1 3 6 1.00e-03 2.00e+07 total flux 0.00e+00
26 1 3 7 1.00e-03 2.00e+07 total flux 0.00e+00
27 1 3 8 1.00e-03 2.00e+07 total flux 0.00e+00
28 1 3 9 1.00e-03 2.00e+07 total flux 0.00e+00
29 1 3 10 1.00e-03 2.00e+07 total flux 0.00e+00
... ... ... .. ... ... ... ... ...
99970 100 98 1 1.00e-03 2.00e+07 total flux 0.00e+00
99971 100 98 2 1.00e-03 2.00e+07 total flux 0.00e+00
99972 100 98 3 1.00e-03 2.00e+07 total flux 0.00e+00
99973 100 98 4 1.00e-03 2.00e+07 total flux 0.00e+00
99974 100 98 5 1.00e-03 2.00e+07 total flux 0.00e+00
99975 100 98 6 1.00e-03 2.00e+07 total flux 0.00e+00
99976 100 98 7 1.00e-03 2.00e+07 total flux 0.00e+00
99977 100 98 8 1.00e-03 2.00e+07 total flux 0.00e+00
99978 100 98 9 1.00e-03 2.00e+07 total flux 0.00e+00
99979 100 98 10 1.00e-03 2.00e+07 total flux 0.00e+00
99980 100 99 1 1.00e-03 2.00e+07 total flux 0.00e+00
99981 100 99 2 1.00e-03 2.00e+07 total flux 0.00e+00
99982 100 99 3 1.00e-03 2.00e+07 total flux 0.00e+00
99983 100 99 4 1.00e-03 2.00e+07 total flux 0.00e+00
99984 100 99 5 1.00e-03 2.00e+07 total flux 0.00e+00
99985 100 99 6 1.00e-03 2.00e+07 total flux 0.00e+00
99986 100 99 7 1.00e-03 2.00e+07 total flux 0.00e+00
99987 100 99 8 1.00e-03 2.00e+07 total flux 0.00e+00
99988 100 99 9 1.00e-03 2.00e+07 total flux 0.00e+00
99989 100 99 10 1.00e-03 2.00e+07 total flux 0.00e+00
99990 100 100 1 1.00e-03 2.00e+07 total flux 0.00e+00
99991 100 100 2 1.00e-03 2.00e+07 total flux 0.00e+00
99992 100 100 3 1.00e-03 2.00e+07 total flux 0.00e+00
99993 100 100 4 1.00e-03 2.00e+07 total flux 0.00e+00
99994 100 100 5 1.00e-03 2.00e+07 total flux 0.00e+00
99995 100 100 6 1.00e-03 2.00e+07 total flux 0.00e+00
99996 100 100 7 1.00e-03 2.00e+07 total flux 0.00e+00
99997 100 100 8 1.00e-03 2.00e+07 total flux 0.00e+00
99998 100 100 9 1.00e-03 2.00e+07 total flux 0.00e+00
99999 100 100 10 1.00e-03 2.00e+07 total flux 0.00e+00
RangeIndex(start=0, stop=100000, step=1)
MultiIndex(levels=[['energy high [eV]', 'energy low [eV]', 'mean', 'mesh 1', 'nuclide', 'score', 'std. dev.'], ['', 'x', 'y', 'z']],
labels=[[3, 3, 3, 1, 0, 4, 5, 2, 6], [1, 2, 3, 0, 0, 0, 0, 0, 0]])
I would like to have 10 pandas dataframes (since 'mesh 1', 'z' goes to 10) in a list where in each dataframe the rows are ('mesh 1', 'y'), the columns are ('mesh 1', 'x') and the values are 'mean'. 我想在一个列表中有10个熊猫数据框(因为'mesh 1','z'变为10),其中每个数据框中的行是('mesh 1','y'),列是('mesh 1 ','x'),其值为'mean'。 I have figured out how to get the 10 dataframes in a list:
我已经弄清楚了如何获取列表中的10个数据帧:
axial_dfs = []
for i in range(10):
temp_df = flux_df[flux_df['mesh 1']['z'] == i]
axial_dfs.append(temp_df)
But I can't figure out how to change the rows and columns. 但是我不知道如何更改行和列。 I would try pivot but I don't know how with the MultiIndex for 'mesh 1'.
我会尝试进行数据透视,但是我不知道如何使用MultiIndex来实现“网格1”。
Appreciate all the help! 感谢所有帮助! Thanks!
谢谢!
I'm a little confused about what you need but I think merging the column levels together in your temp_df
will help you: 我对您的需求有些困惑,但我认为将列级别合并到
temp_df
将对您temp_df
帮助:
axial_dfs = []
for i in range(10):
temp_df = flux_df[flux_df['mesh 1']['z'] == i]
temp_df.columns = temp_df.columns.map('_'.join) # add this line
axial_dfs.append(temp_df)
Now, all of the frames in axial_dfs
will have one level of columns (eg mesh 1_x
or mesh 1_y
), which it sounds like you're comfortable manipulating on your own (using pandas.DataFrame.pivot_table
or pandas.DataFrame.groupby
). 现在,
axial_dfs
所有框架axial_dfs
将具有一层列(例如, mesh 1_x
mesh 1_y
或mesh 1_y
),这听起来像您很愿意自己操作(使用pandas.DataFrame.pivot_table
或pandas.DataFrame.groupby
)。
In the following example, I use unstack
to turn the second index level into a column index. 在下面的示例中,我使用
unstack
将第二个索引级别转换为列索引。 Then, I use a list comprehension to split the result into a list determined by the levels of the first index. 然后,我使用列表推导将结果分成由第一个索引的级别确定的列表。
import pandas as pd
import numpy as np
# Create simple example
data = np.random.randint(8, size=(8, 2))
levels = [['df1', 'df2'], ['a', 'b'], [1, 2]]
idx = pd.MultiIndex.from_product(levels, names=['first', 'second', 'third'])
df = pd.DataFrame(data, index=idx, columns=['col1', 'col2'])
# Step 1: unstack to get second level as column index
df = df.unstack(level='second')['col2']
# Step 2: get a list of chunks of df by first index level
first_unique = df.index.get_level_values('first').unique()
df_ls = [df.loc[x] for x in first_unique]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.