简体   繁体   中英

Pandas DataFrame list comparison for every row in a column

I have a DataFrame

In [3]: df
Out[3]:
                             Price  Size        Codes
2015-04-13 06:14:49-04:00  100.200   900     FT,R6,IS
2015-04-13 06:14:54-04:00  100.190   100     FT,R6,IS
2015-04-13 06:14:54-04:00  100.190   134     FT,R6,IS
2015-04-13 06:15:02-04:00  100.170   200     FT,R6,IS
...                            ...   ...          ...
[248974 rows x 3 columns]

and a list

exclude = ['R6', 'F2', 'IS']

If one of the items of exclude is in a row of df under the Codes column, I would like to filter out that row.

I figured out that I can do this

In [4]: df.Codes.str.split(',')
Out[4]:
2015-04-13 06:14:49-04:00        [FT, R6, IS]
2015-04-13 06:14:54-04:00        [FT, R6, IS]
2015-04-13 06:14:54-04:00        [FT, R6, IS]
2015-04-13 06:15:02-04:00        [FT, R6, IS]
...
Name: Codes, Length: 248974

Essentially what I want is to query along the lines of df[df.Codes.split(',') in exclude] or something like that. Any help greatly appreciated.

df['check'] = df['Codes'].apply(lambda code: 1 if [elt for elt in code.split(',') if elt in exclude] else 0)
df_filtered_out = df[df['check'] == 1]

Just in case: apply() works row by row by default (check pandas docu for more info) and if some_list returns False if some_list is empty and True otherwise.

# for the sake of performance, we turn the lookup list into a set
excludes = set(['R7', 'R5'])

ix = df.Codes.str.split(',').apply(lambda codes: not any(c in excludes for c in codes))
df[ix] # returns the filtered DataFrame

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM