[英]Pandas DataFrame list comparison for every row in a column
I have a DataFrame 我有一个DataFrame
In [3]: df
Out[3]:
Price Size Codes
2015-04-13 06:14:49-04:00 100.200 900 FT,R6,IS
2015-04-13 06:14:54-04:00 100.190 100 FT,R6,IS
2015-04-13 06:14:54-04:00 100.190 134 FT,R6,IS
2015-04-13 06:15:02-04:00 100.170 200 FT,R6,IS
... ... ... ...
[248974 rows x 3 columns]
and a list 和一个清单
exclude = ['R6', 'F2', 'IS']
If one of the items of exclude
is in a row of df
under the Codes
column, I would like to filter out that row. 如果
exclude
项之一在“ Codes
列下的df
行中,我df
滤掉该行。
I figured out that I can do this 我发现我可以做到这一点
In [4]: df.Codes.str.split(',')
Out[4]:
2015-04-13 06:14:49-04:00 [FT, R6, IS]
2015-04-13 06:14:54-04:00 [FT, R6, IS]
2015-04-13 06:14:54-04:00 [FT, R6, IS]
2015-04-13 06:15:02-04:00 [FT, R6, IS]
...
Name: Codes, Length: 248974
Essentially what I want is to query along the lines of df[df.Codes.split(',') in exclude]
or something like that. 本质上,我想要的是沿着
df[df.Codes.split(',') in exclude]
或类似内容进行查询。 Any help greatly appreciated. 任何帮助,不胜感激。
df['check'] = df['Codes'].apply(lambda code: 1 if [elt for elt in code.split(',') if elt in exclude] else 0)
df_filtered_out = df[df['check'] == 1]
Just in case: apply() works row by row by default (check pandas docu for more info) and if some_list
returns False if some_list is empty and True otherwise. 以防万一:默认情况下apply()逐行工作(有关更多信息,请检查pandas
if some_list
),如果some_list为空, if some_list
返回False,否则返回True。
# for the sake of performance, we turn the lookup list into a set
excludes = set(['R7', 'R5'])
ix = df.Codes.str.split(',').apply(lambda codes: not any(c in excludes for c in codes))
df[ix] # returns the filtered DataFrame
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.