I've seen many almost similar questions, but I still didn't find the right answer.
My df has a column ['Name'], containing names of all kind of stores. I want to categorize these by giving, for example, a grocery store the label 'Supermarket' in a new column df['Type'].
I first did this:
df['Type'] = df['Naam'].str.contains('Albert')
This gives a True False series.
after that I did this:
df['Type'] = df['Type'].replace({True: 'Supermarkt'})
That works, but is not very smart..... after writing an other line of str.contains for an other shop, obviously every value in ['Type'] became a Bool again....
Then I did this:
df['Type'] = (df['Naam'].str.contains('Albert'), 'Supermarkt')
My Idea was that I would be able to reuse this code, with an other part of a string over and over.
But.....
df['Type'] = (df['Naam'].str.contains('Albert'), 'Supermarkt')
gives an error:
Length of values does not match length of index
. I think I understand what it means, but can't figure out why the first str.contains() gives a full series and this one gives an error....
So my question is: is there a way to alter df['Type'] = (df['Naam'].str.contains('Albert'), 'Supermarkt')
, in a way that 1: True becomes 'Supermarkt' and all the False values stay in place or are replaced by something else?
Thanks in advance. Greetings Jan
# create a selection
boolean_indexer = df['Naam'].str.contains('Albert')
# create your new column
df.loc[boolean_indexer, 'Type'] = 'Supermarkt'
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.