从熊猫中的MultiIndex DataFrame提取和绘制数据

Question

I've managed to get the following table into a pandas DataFrame. 我设法将下表放入pandas DataFrame中。 It has a multi-dimensional index (file_type, server_count, file_count, thread_count, cacheclear_type) which represents a configuration for some performance measurement. 它具有一个多维索引（file_type，server_count，file_count，thread_count，cacheclear_type），该索引表示一些性能度量的配置。 I then have 5 runs for each configuration. 然后，每个配置有5次运行。

+-----------+--------------+------------+--------------+-----------------+---------+---------+---------+---------+---------+
|           |              |            |              |                 | run_001 | run_002 | run_003 | run_004 | run_005 |
+-----------+--------------+------------+--------------+-----------------+---------+---------+---------+---------+---------+
| file_type | server_count | file_count | thread_count | cacheclear_type |         |         |         |         |         |
+-----------+--------------+------------+--------------+-----------------+---------+---------+---------+---------+---------+
| gor       | 01servers    | 05files    | 20threads    | ccALWAYS        | 15.918  | 16.275  | 15.807  | 17.781  | 16.233  |
|           | 08servers    | 05files    | 20threads    | ccALWAYS        | 17.061  | 15.414  | 16.819  | 15.597  | 16.818  |
| gorz      | 01servers    | 05files    | 20threads    | ccALWAYS        | 12.285  | 11.218  | 12.009  | 14.122  | 10.991  |
|           | 08servers    | 05files    | 20threads    | ccALWAYS        | 9.881   | 9.405   | 9.322   | 10.184  | 9.924   |
| gor       | 01servers    | 10files    | 20threads    | ccALWAYS        | 17.322  | 17.636  | 16.096  | 16.484  | 16.715  |
|           | 08servers    | 10files    | 20threads    | ccALWAYS        | 17.167  | 17.666  | 15.950  | 18.867  | 16.569  |
| gorz      | 01servers    | 10files    | 20threads    | ccALWAYS        | 14.718  | 19.553  | 17.930  | 21.415  | 21.495  |
|           | 08servers    | 10files    | 20threads    | ccALWAYS        | 10.236  | 9.948   | 12.605  | 9.780   | 10.320  |
| gor       | 01servers    | 15files    | 20threads    | ccALWAYS        | 19.265  | 17.128  | 17.630  | 18.739  | 16.833  |
|           | 08servers    | 15files    | 20threads    | ccALWAYS        | 23.083  | 22.084  | 25.024  | 24.677  | 20.648  |
| gorz      | 01servers    | 15files    | 20threads    | ccALWAYS        | 15.401  | 28.282  | 28.727  | 24.645  | 27.509  |
|           | 08servers    | 15files    | 20threads    | ccALWAYS        | 10.307  | 12.217  | 13.005  | 12.277  | 12.224  |
| gor       | 01servers    | 20files    | 20threads    | ccALWAYS        | 23.744  | 20.539  | 21.416  | 22.921  | 22.794  |
|           | 08servers    | 20files    | 20threads    | ccALWAYS        | 35.393  | 36.218  | 35.949  | 35.157  | 37.342  |
| gorz      | 01servers    | 20files    | 20threads    | ccALWAYS        | 19.505  | 23.756  | 25.767  | 26.575  | 25.239  |
|           | 08servers    | 20files    | 20threads    | ccALWAYS        | 11.398  | 11.332  | 15.086  | 16.115  | 13.479  |
+-----------+--------------+------------+--------------+-----------------+---------+---------+---------+---------+---------+

I would like to take all the gor,1servers,20threads,ccALWAYS configurations and create one data point for each of the XXfiles configurations. 我想采用所有gor，1servers，20threads，ccALWAYS配置，并为每个XXfiles配置创建一个数据点。 So to begin with I'd like to somehow get a DataFrame that looks like this: 因此，首先，我想以某种方式获取如下所示的DataFrame：

+-----------+--------------+------------+--------------+-----------------+---------+---------+---------+---------+---------+
|           |              |            |              |                 | run_001 | run_002 | run_003 | run_004 | run_005 |
+-----------+--------------+------------+--------------+-----------------+---------+---------+---------+---------+---------+
| file_type | server_count | file_count | thread_count | cacheclear_type |         |         |         |         |         |
+-----------+--------------+------------+--------------+-----------------+---------+---------+---------+---------+---------+
| gor       | 01servers    | 05files    | 20threads    | ccALWAYS        | 15.918  | 16.275  | 15.807  | 17.781  | 16.233  |
| gor       | 01servers    | 10files    | 20threads    | ccALWAYS        | 17.322  | 17.636  | 16.096  | 16.484  | 16.715  |
| gor       | 01servers    | 15files    | 20threads    | ccALWAYS        | 19.265  | 17.128  | 17.630  | 18.739  | 16.833  |
| gor       | 01servers    | 20files    | 20threads    | ccALWAYS        | 23.744  | 20.539  | 21.416  | 22.921  | 22.794  |
+-----------+--------------+------------+--------------+-----------------+---------+---------+---------+---------+---------+

How do I do that? 我怎么做？

Answer 1

我使用以下代码设法使用query（）函数过滤数据，使其看起来像问题中的第二张表：

df.query('file_type == "gor" & server_count == "01servers"').sortlevel(2)

从熊猫中的MultiIndex DataFrame提取和绘制数据

问题描述

1 个解决方案

解决方案1
0 2014-10-15 11:09:09

从熊猫中的MultiIndex DataFrame提取和绘制数据

问题描述

1 个解决方案

解决方案1 0 2014-10-15 11:09:09

解决方案1
0 2014-10-15 11:09:09