简体   繁体   中英

Filtering syntax for pandas dataframe groupby with logic condition

I have a pandas dataframe containing indices that have a one-to-many relationship. A very simplified and shortened example of my data is shown in the DataFrame Example link. I want to get a list or Series or ndarray of the unique namIdx values in which nCldLayers <= 1. The final result should show indices of 601 and 603.

  1. I am able to accomplish this with the 3 statements below, but I am wondering if there is a much better, more succinct way with perhaps 'filter', 'select', or 'where'.

     grouped=(namToViirs['nCldLayers']<=1).groupby(namToViirs.index).all(axis=0) grouped = grouped[grouped==True] filterIndex = grouped.index
  2. Is there a better approach in accomplishing this result by applying the logical condition (namToViirs['nCldLayers >= 1) in a subsequent part of the chain, ie, first group then apply logical condition, and then retrieve only the namIdx where the logical result is true for each member of the group?

I think your code works nice, only you can add use small changes:

In all can be omit axis=0
grouped==True can be omit ==True

grouped=(namToViirs['nCldLayers']<=1).groupby(level='namldx').all()
grouped = grouped[grouped]
filterIndex = grouped.index
print (filterIndex)
Int64Index([601, 603], dtype='int64', name='namldx')

I think better is first filter by boolean indexing and then groupby , because less loops -> better performance.

For question 1, see jezrael answer. For question 2, you could play with indexes as sets:

namToViirs.index[namToViirs.nCldLayers <= 1] \ 
          .difference(namToViirs.index[namToViirs.nCldLayers > 1])

You might be interested in this answer .

The implementation is currently a bit hackish, but it should reduce your statement above to:

filterIndex = ((namToViirs['nCldLayers']<=1)
                .groupby(namToViirs.index).all(axis=0)[W].index)

EDIT: also see this answer for an analogous approach not requiring external components, resulting in:

filterIndex = ((namToViirs['nCldLayers']<=1)
                .groupby(namToViirs.index).all(axis=0)[lambda x : x].index)

Another option is to use .pipe() and a function which applies the desired filtering.

For instance:

filterIndex = ((namToViirs['nCldLayers']<=1)
                .groupby(namToViirs.index)
                .all(axis=0)
                .pipe(lambda s : s[s])
                .index)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM