简体   繁体   English

Pandas Panel花式索引:如何根据每个df中多列的布尔值返回(索引)Panel中的所有DataFrame

[英]Pandas Panel fancy indexing: How to return (index of) all DataFrames in Panel based on Boolean of multiple columns in each df

I've got a Pandas Panel with many DataFrames with the same rows/column labels. 我有一个Pandas面板,其中包含许多具有相同行/列标签的DataFrame。 I want to make a new panel with DataFrames that fulfill certain criteria based on a couple columns. 我想创建一个包含DataFrames的新面板,该面板基于几列来满足某些条件。

This is easy with dataframes and rows: Say I have a df, zHe_compare. 数据帧和行很容易:比如我有一个df,zHe_compare。 I can get the suitable rows with: 我可以得到合适的行:

zHe_compare[(zHe_compare['zHe_calc'] > 100) & (zHe_compare['zHe_med'] > 100) | ((zHe_obs_lo_2s <=zHe_compare['zHe_calc']) & (zHe_compare['zHe_calc'] <= zHe_obs_hi_2s))]

but how do I do (pseudocode, simplified boolean): 但我该怎么做(伪代码,简化布尔):

good_results_panel = results_panel[ all_dataframes[ sum ('zHe_calc' < 'zHe_obs') > min_num ] ]

I know the the inner boolean part, but how do I specify this for each dataframe in a panel? 我知道内部布尔部分,但是如何为面板中的每个数据帧指定它? Because I need multiple columns from each df, I haven't met success using the panel.minor_xs slicing techniques. 因为我需要每个df的多个列,所以我没有使用panel.minor_xs切片技术取得成功。

thanks! 谢谢!

As mentioned in its documentation , Panel is currently a bit under-developed, so the sweet syntax you've come to rely on when working with DataFrame isn't there yet. 正如其文档中所提到的, Panel目前还有点欠发达,所以在使用DataFrame时你依赖的甜蜜语法还没有。

Meanwhile, I would suggest using the Panel.select method: 同时,我建议使用Panel.select方法:

def is_good_result(item_label):
    # whatever condition over the selected item
    df = results_panel[item_label]
    return df['col1'].sum() > 5

good_results = results.select(is_good_result)

The is_good_result function returns a boolean value. is_good_result函数返回一个布尔值。 Note that its argument is not a DataFrame instance, because Panel.select applies its argument to the item label, rather than the DataFrame content of that item. 请注意,其参数不是DataFrame实例,因为Panel.select将其参数应用于项标签,而不是该项的DataFrame内容。

Of course, you can stuff that whole criterion function into a lambda in one statement, if you're into the whole brevity thing: 当然,如果你涉及整个简洁的事情,你可以在一个语句中将整个标准函数填充到lambda中:

good_results = results.select(
                 lambda item_label: results[item_label]['col1'].sum() > 5
                 )

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM