简体   繁体   中英

Searching within lists in Pandas dataframe Column, Error

Situation I'm running into this issue working with a database with over 1,000 tables. I want to filter down the table names based on column name values. I'm trying to run a str.contains() on my dataframe, but get an error. Error reads "None of [Float64Index([nan, nan, nan, nan, nan], dtype='float64')] are in the [columns]" I was able to reproduce the error with dummy data.

My goal is to return a dataframe filtered to 'table5' because it contains the column name 'date'

listoftables = ['table1', 'table2', 'table3', 'table4', 'table5']
columnnames = [['age', 'name', 'school'], 
               ['age', 'name', 'school'], 
               ['age', 'name', 'school'], 
               ['age', 'name', 'school'], 
               ['audit', 'auditrunlist', 'date']]


example = pd.DataFrame(
    {'TableName': listoftables,
     'col_names'  : columnnames
    })

example[(example['col_names'].str.contains('date'))]

I think the error is because I'm searching for a string within a list. What confuses me more, if I run example[(example['col_names'].str.contains('[audit, auditrunlist, date]'))] I get the same error.

If I add one more column that isn't a list , I get the results I expect


listoftables = ['table1', 'table2', 'table3', 'table4', 'table5']
columnnames = [['age', 'name', 'school'], 
               ['age', 'name', 'school'], 
               ['age', 'name', 'school'], 
               ['age', 'name', 'school'], 
               ['audit', 'auditrunlist', 'date']]

no_list_columnnames = ['age, name, school', 
               'age name school', 
                'age name school', 
               'age name school', 
               'audit auditrunlist date']


example = pd.DataFrame(
    {'TableName': listoftables,
     'col_names'  : columnnames,
     'no_list_col_names' : no_list_columnnames
    })

# this returns what i expect
example[(example['no_list_col_names'].str.contains('date'))]

I think I have two outcomes, I can either try and find a way to search within a list in a pandas dataframe or I can find a way to convert a column in a pandas dataframe from a list to a string.

What is the better way to filter a pandas dataframe with a list as a column?

thanks for pointing out this issue it is interesting,

my approach would be to use the classic apply to create a flag

df['flag']=df.apply(lambda x: 1 if 'date' in x['col_names'] else 0, axis=1)

after i would filter:

df_filtered=df.loc[df['flag']==1,:]

probably they exist clever options but this do the work

This can be achieved in multiple ways.

-- Filter

example = example[[True if ('date' in i) else False for i in example['col_names']]]

-- Expanding the lists and then filter. The code will look nicer but might need more space.

example = example.explode('col_names')
example = example[example['col_names'] == 'date']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM