简体   繁体   中英

Filter rows containing certain values in the columns

I want to filter out rows contains specific values through out multiple columns in data frame.

Eg

    code tag number floor  note
1   1111  *   **     34     no 
2   2323  7   899     7     no
3   3677  #   900    11     no
4   9897  10  134    *      no
5    #    #   566    11     no
6   3677  55  908    11     no

I want to filter out all rows contains #, *, ** in columns code, tag, number, floor.

What I want to get is

    code tag number floor  note
1   1111  *   **     34     no 
3   3677  #   900    11     no
4   9897  10  134    *      no
5    #    #   566    11     no

I was trying to use isin method in data frame, but it does work with one column, but does not work in multiple columns. Thanks!

I think you need apply , isin , and any with boolean indexing:

list = ['#','*','**']
cols = ['code','tag','number','floor']
df[df[cols].apply(lambda x: x.isin(list).any(), axis=1)]

Output:

   code tag number floor note
1  1111   *     **    34   no
3  3677   #    900    11   no
4  9897  10    134     *   no
5     #   #    566    11   no

You could also use df.applymap :

s = {'*', '**', '#'}
df[df.applymap(lambda x: x in s).max(1)]

   code tag number floor note
1  1111   *     **    34   no
3  3677   #    900    11   no
4  9897  10    134     *   no
5     #   #    566    11   no

piR suggested a crazy (but it works!) alternative:

df[df.apply(set, 1) & {'*', '**', '#'}]

   code tag number floor note
1  1111   *     **    34   no
3  3677   #    900    11   no
4  9897  10    134     *   no
5     #   #    566    11   no

Option 1
Assuming no other pre-existing pir

df[df.replace(['#', '*', '**'], 'pir').eq('pir').any(1)]

   code tag number floor note
1  1111   *     **    34   no
3  3677   #    900    11   no
4  9897  10    134     *   no
5     #   #    566    11   no

Option 2
Obnoxious numpy broadcasting. Fast at first but scales quadratically

df[(df.values[None, :] == np.array(['*', '**', '#'])[:, None, None]).any(0).any(1)]

   code tag number floor note
1  1111   *     **    34   no
3  3677   #    900    11   no
4  9897  10    134     *   no
5     #   #    566    11   no

Option 3
Less obnoxious np.in1d

df[np.in1d(df.values, ['*', '**', '#']).reshape(df.shape).any(1)]

   code tag number floor note
1  1111   *     **    34   no
3  3677   #    900    11   no
4  9897  10    134     *   no
5     #   #    566    11   no

Option 4
Over the top with map

df[list(
    map(bool,
        map({'*', '**', '#'}.intersection,
            map(set,
                zip(*(df[c].values.tolist() for c in df)))))
)]

   code tag number floor note
1  1111   *     **    34   no
3  3677   #    900    11   no
4  9897  10    134     *   no
5     #   #    566    11   no

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM