I want to find the name of the column in a dataframe ("categories") that contains a given string.
categories
Groceries Electricity Fastfood Parking
0 SHOP ELCOMPANY MCDONALDS park
1 MARKET ELECT Subway car
2 market electr Restauran 247
Say I want to search this entire dataframe for string "MCDO". The answer should be "Fastfood" . I tried using str.contains but it doesn't seem to work for dataframes.
How can I achieve this? Thank you.
You can check with contains
with any
df.apply(lambda x : x.str.contains('MCDO')).any().loc[lambda x : x].index
Index(['Fastfood'], dtype='object')
Or use:
print(df.apply(lambda x: x.str.contains('MCDO')).replace(False,np.nan).dropna(axis=1,how='all').columns.item())
Output:
Fastfood
If you can search for the entire string, it makes it easier,
(df == 'MCDONALDS').any().idxmax()
else use apply,
df.apply(lambda x: x.str.startswith('MCDO').any()).idxmax()
One can also use for
loop for this:
def strfinder(df, mystr):
for col in df:
for item in df[col]:
if mystr in item:
return col
print(strfinder(df, 'MCDO'))
To get all columns that may have the string, eg in modified dataframe below:
Groceries Electricity Fastfood Parking
0 SHOP ELCOMPANY MCDONALDS park
1 MARKET MCDON Subway car
2 market electr Restauran 247
one can use "list comprehension":
mystr = 'MCDO'
outlist = [ col
for col in df
for item in df[col]
if mystr in item ]
print(outlist)
Output:
['Electricity', 'Fastfood']
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.