简体   繁体   English

如何通过 pandas MultiIndex 高效过滤和求和

[英]How to efficiently filter and sum by a pandas MultiIndex

I have a DataFrame with a MultiIndex where I would like to, as efficiently as possible:我有一个带有 MultiIndex 的 DataFrame,我希望尽可能高效:

  1. Filter by one index ( flag & flag_filter != 0 )按一个索引过滤 ( flag & flag_filter != 0 )
  2. Group and sum by the other two ( df.groupby(['time', 'sensor']).sum(['col1','col2','col3']) )对其他两个进行分组和求和( df.groupby(['time', 'sensor']).sum(['col1','col2','col3'])

So as a setup:所以作为一个设置:

import pandas as pd
import numpy as np

index = pd.MultiIndex.from_product(
    [
        range(0, 0xff),
        range(0, 5000),
        range(1, 3),
    ], names = ["flags", "time", "sensor"]
)

data = pd.DataFrame({
    "col1": np.random.uniform(size=len(index), low=0.0, high=0.5),
    "col2": np.random.uniform(size=len(index), low=0.0, high=0.5),
    "col3": np.random.uniform(size=len(index), low=0.0, high=0.5),
}, index = index)

I'm hoping to get, from this, a DataFrame with the same columns, but an index of just time, sensor .我希望从中得到一个 DataFrame 具有相同的列,但只是time, sensor The idea is we threw out rows that didn't match the filter, and summed the rows that did, while still maintaining the time, sensor grouping.这个想法是我们扔掉了与过滤器不匹配的行,并对匹配的行求和,同时仍然保持time, sensor分组。

Combine .loc with droplevel :结合.locdroplevel

# Let's say we want to filter for even flags
flag_filter = data.index.get_level_values("flags") % 2 == 0

# Select matching rows and drop the first level 
data.loc[flag_filter, :].droplevel(0)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM