简体   繁体   English

从熊猫中的MultiIndex DataFrame提取和绘制数据

[英]Extracting and plotting data from a MultiIndex DataFrame in pandas

I've managed to get the following table into a pandas DataFrame. 我设法将下表放入pandas DataFrame中。 It has a multi-dimensional index (file_type, server_count, file_count, thread_count, cacheclear_type) which represents a configuration for some performance measurement. 它具有一个多维索引(file_type,server_count,file_count,thread_count,cacheclear_type),该索引表示一些性能度量的配置。 I then have 5 runs for each configuration. 然后,每个配置有5次运行。

+-----------+--------------+------------+--------------+-----------------+---------+---------+---------+---------+---------+
|           |              |            |              |                 | run_001 | run_002 | run_003 | run_004 | run_005 |
+-----------+--------------+------------+--------------+-----------------+---------+---------+---------+---------+---------+
| file_type | server_count | file_count | thread_count | cacheclear_type |         |         |         |         |         |
+-----------+--------------+------------+--------------+-----------------+---------+---------+---------+---------+---------+
| gor       | 01servers    | 05files    | 20threads    | ccALWAYS        | 15.918  | 16.275  | 15.807  | 17.781  | 16.233  |
|           | 08servers    | 05files    | 20threads    | ccALWAYS        | 17.061  | 15.414  | 16.819  | 15.597  | 16.818  |
| gorz      | 01servers    | 05files    | 20threads    | ccALWAYS        | 12.285  | 11.218  | 12.009  | 14.122  | 10.991  |
|           | 08servers    | 05files    | 20threads    | ccALWAYS        | 9.881   | 9.405   | 9.322   | 10.184  | 9.924   |
| gor       | 01servers    | 10files    | 20threads    | ccALWAYS        | 17.322  | 17.636  | 16.096  | 16.484  | 16.715  |
|           | 08servers    | 10files    | 20threads    | ccALWAYS        | 17.167  | 17.666  | 15.950  | 18.867  | 16.569  |
| gorz      | 01servers    | 10files    | 20threads    | ccALWAYS        | 14.718  | 19.553  | 17.930  | 21.415  | 21.495  |
|           | 08servers    | 10files    | 20threads    | ccALWAYS        | 10.236  | 9.948   | 12.605  | 9.780   | 10.320  |
| gor       | 01servers    | 15files    | 20threads    | ccALWAYS        | 19.265  | 17.128  | 17.630  | 18.739  | 16.833  |
|           | 08servers    | 15files    | 20threads    | ccALWAYS        | 23.083  | 22.084  | 25.024  | 24.677  | 20.648  |
| gorz      | 01servers    | 15files    | 20threads    | ccALWAYS        | 15.401  | 28.282  | 28.727  | 24.645  | 27.509  |
|           | 08servers    | 15files    | 20threads    | ccALWAYS        | 10.307  | 12.217  | 13.005  | 12.277  | 12.224  |
| gor       | 01servers    | 20files    | 20threads    | ccALWAYS        | 23.744  | 20.539  | 21.416  | 22.921  | 22.794  |
|           | 08servers    | 20files    | 20threads    | ccALWAYS        | 35.393  | 36.218  | 35.949  | 35.157  | 37.342  |
| gorz      | 01servers    | 20files    | 20threads    | ccALWAYS        | 19.505  | 23.756  | 25.767  | 26.575  | 25.239  |
|           | 08servers    | 20files    | 20threads    | ccALWAYS        | 11.398  | 11.332  | 15.086  | 16.115  | 13.479  |
+-----------+--------------+------------+--------------+-----------------+---------+---------+---------+---------+---------+

I would like to take all the gor,1servers,20threads,ccALWAYS configurations and create one data point for each of the XXfiles configurations. 我想采用所有gor,1servers,20threads,ccALWAYS配置,并为每个XXfiles配置创建一个数据点。 So to begin with I'd like to somehow get a DataFrame that looks like this: 因此,首先,我想以某种方式获取如下所示的DataFrame:

+-----------+--------------+------------+--------------+-----------------+---------+---------+---------+---------+---------+
|           |              |            |              |                 | run_001 | run_002 | run_003 | run_004 | run_005 |
+-----------+--------------+------------+--------------+-----------------+---------+---------+---------+---------+---------+
| file_type | server_count | file_count | thread_count | cacheclear_type |         |         |         |         |         |
+-----------+--------------+------------+--------------+-----------------+---------+---------+---------+---------+---------+
| gor       | 01servers    | 05files    | 20threads    | ccALWAYS        | 15.918  | 16.275  | 15.807  | 17.781  | 16.233  |
| gor       | 01servers    | 10files    | 20threads    | ccALWAYS        | 17.322  | 17.636  | 16.096  | 16.484  | 16.715  |
| gor       | 01servers    | 15files    | 20threads    | ccALWAYS        | 19.265  | 17.128  | 17.630  | 18.739  | 16.833  |
| gor       | 01servers    | 20files    | 20threads    | ccALWAYS        | 23.744  | 20.539  | 21.416  | 22.921  | 22.794  |
+-----------+--------------+------------+--------------+-----------------+---------+---------+---------+---------+---------+

How do I do that? 我怎么做?

我使用以下代码设法使用query()函数过滤数据,使其看起来像问题中的第二张表:

df.query('file_type == "gor" & server_count == "01servers"').sortlevel(2)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM