简体   繁体   English

如果我不知道存在哪些列,如何使用 contains 对列列表进行过滤?

[英]How to filter on a pandas dataframe using contains against a list of columns, if I don't know which columns are present?

I want to filter my dataframe to look for columns containing a known string.我想过滤我的 dataframe 以查找包含已知字符串的列。 I know you can do something like this:我知道你可以这样做:

summ_proc = summ_proc[
                summ_proc['data.process.name'].str.contains(indicator) |         
                summ_proc['data.win.eventdata.processName'].str.contains(indicator) |
                summ_proc['data.win.eventdata.logonProcessName'].str.contains(indicator) | 
                summ_proc['syscheck.audit.process.name'].str.contains(indicator)
            ]

where I'm using the |我在哪里使用 | operator to check against multiple columns.运算符来检查多个列。 But there are cases where a certain column name isn't present.但在某些情况下,某个列名不存在。 So 'data.process.name' might not be present every time.所以'data.process.name'可能不会每次都出现。

I tried the following implementation:我尝试了以下实现:

summ_proc[summ_proc.apply(lambda x: summ_proc['data.process.name'].str.contains(indicator) if 'data.process.name' in summ_proc.columns else summ_proc)]

And that works.那行得通。 But I'm not sure how I can apply the OR operator to this lambda function.但我不确定如何将 OR 运算符应用于此 lambda function。 I want all the rows where either data.process.name or data.win.eventdata.processName or data.win.eventdata.logonProcessName or syscheck.audit.process.name contains the indicator.我想要data.process.namedata.win.eventdata.processNamedata.win.eventdata.logonProcessNamesyscheck.audit.process.name包含指标的所有行。

EDIT:编辑:

I tried the following approach, where I created individual frames and concated all the frames.我尝试了以下方法,在其中创建了单个帧并连接了所有帧。

summ_proc1 = summ_proc[summ_proc.apply(lambda x: summ_proc['data.process.name'].str.contains(indicator) if 'data.process.name' in summ_proc.columns else summ_proc)]
summ_proc2 = summ_proc[summ_proc.apply(lambda x: summ_proc['data.win.eventdata.processName'].str.contains(indicator) if 'data.win.eventdata.processName' in summ_proc.columns else summ_proc)]
summ_proc3 = summ_proc[summ_proc.apply(lambda x: summ_proc['data.win.eventdata.logonProcessName'].str.contains(indicator) if 'data.win.eventdata.logonProcessName' in summ_proc.columns else summ_proc)]
frames = [summ_proc1, summ_proc2, summ_proc3]
result = pd.concat(frames)

This works, but I'm curious if there's a better more pythonic approach?这行得通,但我很好奇是否有更好的pythonic方法? Or if this current method will cause more downstream issues?或者如果这种当前方法会导致更多的下游问题?

should work with something like this:应该使用这样的东西:

import numpy as np


columns = ['data.process.name', 'data.win.eventdata.processName']

# filter columns that are in summ_proc
available_columns = [c for c in columns if c in summ_proc.columns]

# array of Boolean values indicating if c contains indicator
ss = [summ_proc[c].str.contains(indicator) for c in available_columns]

# reduce without '|' by using 'np.logical_or'
indexer = np.logical_or.reduce(ss)

result = summ_proc[indexer]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如果包含 *,则在列上过滤 Pandas 数据框 - Filter pandas dataframe on columns if contains * Python Pandas DataFrame:使用列表过滤列? - Python pandas dataframe: filter columns using a list? 如何过滤包含Pandas DataFrame Python中传递列表中所有substring的df列中的值? - How to filter values in columns of df that contains all substring in passed list in Pandas DataFrame Python? 找到 dataframe 列中列表中的值 pandas - find dataframe values that are present in list in columns pandas 给定一个值,我如何知道它出现在哪些列中? - Given a value how can I know in which columns it is present? 第一次不存在字符串时,如何按列过滤 dataframe? - How can I filter a dataframe by columns for the first time a string is not present? 当列表元素可能或可能不作为列存在时,使用给定列表从 Pandas Dataframe 中过滤列 - Filter Columns from Pandas Dataframe with given list when list elements may or may not be present as column 如何将包含字典列表的 dataframe 列转换为单独的列? - How to convert dataframe column which contains list of dictionary into separate columns? 使用元组列表中的元组按多列过滤 pandas dataframe - Filter pandas dataframe by multiple columns, using tuple from list of tuples 熊猫从数据框中提取在其他数据框中不存在的列 - Pandas extract columns from dataframe which are not present in another dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM