I have following dataframe called "files_to_export":
|Assignee |otherColumns...|
["Samsung", "Apple", "Apple Inc."]
["Honda Tech", "Honda Motors", "General Motors", "Huawei"]
I have another list called "Companies" that contains the companies I'm interested at having in my data, the list structure is the following:
Companies=['Ford','General motors','Mazda',..........]
So i want to have the rows in my data that contain at least one company in my company list( by contain i mean the regex sense of containing, in other words if there is a row with "Ford global tech." then i want it included in my data because it has the word Ford.
I wrote the following code but i don't capture any data:
output = file_to_export[file_to_export['Assignee'].str.contains('|'.join(companies), case=False, na=False).count(True) > 0]
The actual result is an empty dataframe with no rows in the output dataframe
The expected result is to have a dataframe with rows of different companies in the out dataframe
Any suggestions? Thanks for your help and i wish that i was clear in my question!
Setup of data
files_to_export = pd.DataFrame({'Assignee':[['Samsung','Apple','Apple Inc.'],['Honda Tech','Honda Motors','General Motors']],
'other_col':[1,2]})
companies = ['Ford','General motors','Mazda']
# Filter df
# The pattern is a case of or where matching any of the individuals strings will work
pattern = '|'.join(companies) # 'Ford|General motors|Mazda'
# convert the column of lists to a column of comma separated strings
# then check for string containment
files_to_export[files_to_export.Assignee.apply(lambda x: ','.join(x)).str
.contains(pattern,
case=False)]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.