[英]Pandas Panel fancy indexing: How to return (index of) all DataFrames in Panel based on Boolean of multiple columns in each df
I've got a Pandas Panel with many DataFrames with the same rows/column labels. 我有一个Pandas面板,其中包含许多具有相同行/列标签的DataFrame。 I want to make a new panel with DataFrames that fulfill certain criteria based on a couple columns.
我想创建一个包含DataFrames的新面板,该面板基于几列来满足某些条件。
This is easy with dataframes and rows: Say I have a df, zHe_compare. 数据帧和行很容易:比如我有一个df,zHe_compare。 I can get the suitable rows with:
我可以得到合适的行:
zHe_compare[(zHe_compare['zHe_calc'] > 100) & (zHe_compare['zHe_med'] > 100) | ((zHe_obs_lo_2s <=zHe_compare['zHe_calc']) & (zHe_compare['zHe_calc'] <= zHe_obs_hi_2s))]
but how do I do (pseudocode, simplified boolean): 但我该怎么做(伪代码,简化布尔):
good_results_panel = results_panel[ all_dataframes[ sum ('zHe_calc' < 'zHe_obs') > min_num ] ]
I know the the inner boolean part, but how do I specify this for each dataframe in a panel? 我知道内部布尔部分,但是如何为面板中的每个数据帧指定它? Because I need multiple columns from each df, I haven't met success using the
panel.minor_xs
slicing techniques. 因为我需要每个df的多个列,所以我没有使用
panel.minor_xs
切片技术取得成功。
thanks! 谢谢!
As mentioned in its documentation , Panel
is currently a bit under-developed, so the sweet syntax you've come to rely on when working with DataFrame
isn't there yet. 正如其文档中所提到的,
Panel
目前还有点欠发达,所以在使用DataFrame
时你依赖的甜蜜语法还没有。
Meanwhile, I would suggest using the Panel.select
method: 同时,我建议使用
Panel.select
方法:
def is_good_result(item_label):
# whatever condition over the selected item
df = results_panel[item_label]
return df['col1'].sum() > 5
good_results = results.select(is_good_result)
The is_good_result
function returns a boolean value. is_good_result
函数返回一个布尔值。 Note that its argument is not a DataFrame
instance, because Panel.select
applies its argument to the item label, rather than the DataFrame
content of that item. 请注意,其参数不是
DataFrame
实例,因为Panel.select
将其参数应用于项标签,而不是该项的DataFrame
内容。
Of course, you can stuff that whole criterion function into a lambda in one statement, if you're into the whole brevity thing: 当然,如果你涉及整个简洁的事情,你可以在一个语句中将整个标准函数填充到lambda中:
good_results = results.select(
lambda item_label: results[item_label]['col1'].sum() > 5
)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.