Situation I'm running into this issue working with a database with over 1,000 tables. I want to filter down the table names based on column name values. I'm trying to run a str.contains()
on my dataframe, but get an error. Error reads "None of [Float64Index([nan, nan, nan, nan, nan], dtype='float64')] are in the [columns]"
I was able to reproduce the error with dummy data.
My goal is to return a dataframe filtered to 'table5' because it contains the column name 'date'
listoftables = ['table1', 'table2', 'table3', 'table4', 'table5']
columnnames = [['age', 'name', 'school'],
['age', 'name', 'school'],
['age', 'name', 'school'],
['age', 'name', 'school'],
['audit', 'auditrunlist', 'date']]
example = pd.DataFrame(
{'TableName': listoftables,
'col_names' : columnnames
})
example[(example['col_names'].str.contains('date'))]
I think the error is because I'm searching for a string within a list. What confuses me more, if I run example[(example['col_names'].str.contains('[audit, auditrunlist, date]'))]
I get the same error.
If I add one more column that isn't a list , I get the results I expect
listoftables = ['table1', 'table2', 'table3', 'table4', 'table5']
columnnames = [['age', 'name', 'school'],
['age', 'name', 'school'],
['age', 'name', 'school'],
['age', 'name', 'school'],
['audit', 'auditrunlist', 'date']]
no_list_columnnames = ['age, name, school',
'age name school',
'age name school',
'age name school',
'audit auditrunlist date']
example = pd.DataFrame(
{'TableName': listoftables,
'col_names' : columnnames,
'no_list_col_names' : no_list_columnnames
})
# this returns what i expect
example[(example['no_list_col_names'].str.contains('date'))]
I think I have two outcomes, I can either try and find a way to search within a list in a pandas dataframe or I can find a way to convert a column in a pandas dataframe from a list to a string.
What is the better way to filter a pandas dataframe with a list as a column?
thanks for pointing out this issue it is interesting,
my approach would be to use the classic apply to create a flag
df['flag']=df.apply(lambda x: 1 if 'date' in x['col_names'] else 0, axis=1)
after i would filter:
df_filtered=df.loc[df['flag']==1,:]
probably they exist clever options but this do the work
This can be achieved in multiple ways.
-- Filter
example = example[[True if ('date' in i) else False for i in example['col_names']]]
-- Expanding the lists and then filter. The code will look nicer but might need more space.
example = example.explode('col_names')
example = example[example['col_names'] == 'date']
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.