为多索引熊猫数据框中的每个值创建直方图

Question

Below is a small section from my pandas dataframe. 以下是我的熊猫数据框的一小部分。 I would like to be able to get separate 'vel_x' histograms (counts, bins) for each value in count. 我希望能够为计数中的每个值获得单独的“ vel_x”直方图（计数，箱）。 Is there a fast, built-in way to do this without just looping through each value in count? 是否有一种快速的内置方法来执行此操作，而不仅仅是循环遍历每个计数值？

+-------+-------+-------+-------+--------+----+--------+
|       |       | x_loc | y_loc | vel_x  | …  |  vel_z |
+-------+-------+-------+-------+--------+----+--------+
| count | slice |       |       |        |    |        |
|   1   | 3     |     4 |     0 |     96 | 88 |     35 |
|       | 4     |    10 |     2 |     54 | 42 |     37 |
|       | 5     |     9 |    32 |      8 | 70 |     34 |
|       | 6     |    36 |    89 |     69 | 46 |     78 |
|   2   | 5     |    17 |    41 |     48 | 45 |     71 |
|       | 6     |    50 |    66 |     82 | 72 |     59 |
|       | 7     |    14 |    24 |     55 | 20 |     89 |
|       | 8     |    76 |    36 |     13 | 14 |     21 |
|   3   | 5     |    97 |    19 |     41 | 61 |     72 |
|       | 6     |    22 |     4 |     56 | 82 |     15 |
|       | 7     |    17 |    57 |     30 | 63 |     88 |
|       | 8     |    83 |    43 |     35 |  8 |      4 |
+-------+-------+-------+-------+--------+----+--------+

I have tried many methods (apply, map, etc.), but I have not been able to get any of them to work. 我尝试了许多方法（应用，地图等），但是我无法使它们中的任何一个起作用。 Each method just applies the mapped function to all the row values. 每种方法仅将映射函数应用于所有行值。

Essentially, I want to map this to each value in count (count_value) below: 本质上，我想将此映射到下面的count（count_value）个值中：

def create_histogram(data, count_value):
    values, bin_edges = np.histogram(data.loc[count_value, 'vel_x'])
    return values

then something like this: 然后是这样的：

data.index.get_level_values('Count').map(create_histrogram(data))

Also, for reference, this is the way I can currently perform what I want, but it is not very efficient because my dataframe is very large. 另外，作为参考，这是我当前可以执行所需操作的方式，但是效率不高，因为我的数据帧非常大。

for count_value in data.index.get_level_values('Count').unique:
    values, bin_edges = np.histogram(data.loc[count_value, 'vel_x'])

the returned values can then be stored in another array. 然后可以将返回的值存储在另一个数组中。

Thank you in advance for your help! 预先感谢您的帮助！

Answer 1

How about using groupby with level param: 如何将groupby与level参数一起使用：

level : int, level name, or sequence of such, default None If the axis is a MultiIndex (hierarchical), group by a particular level or levels level：int，级别名称或此类的序列，默认值无如果轴是MultiIndex（分层），则按一个或多个特定级别分组

for count, sdf in df.groupby(level=0):
    values, bin_edges = np.histogram(sdf.loc[count, 'vel_x'])

UPDATE 更新

Since you think the way mean(level=level) works is better, you can also try this way which is inspired by mean source code : 由于您认为mean(level=level)工作方式更好，因此您也可以尝试以下这种方法，这种方法受mean 源代码的启发：

df['vel_x'].groupby(level=0).aggregate(np.histogram)

为多索引熊猫数据框中的每个值创建直方图

问题描述

1 个解决方案

解决方案1
2 已采纳 2017-03-04 08:29:59

为多索引熊猫数据框中的每个值创建直方图

问题描述

1 个解决方案

解决方案1 2 已采纳 2017-03-04 08:29:59

解决方案1
2 已采纳 2017-03-04 08:29:59