简体   繁体   中英

create new column Pandas df with str.contains gives: Length of values does not match length of index

I've seen many almost similar questions, but I still didn't find the right answer.

My df has a column ['Name'], containing names of all kind of stores. I want to categorize these by giving, for example, a grocery store the label 'Supermarket' in a new column df['Type'].

I first did this:

df['Type'] = df['Naam'].str.contains('Albert')

This gives a True False series.

after that I did this:

df['Type'] = df['Type'].replace({True: 'Supermarkt'})

That works, but is not very smart..... after writing an other line of str.contains for an other shop, obviously every value in ['Type'] became a Bool again....

Then I did this:

df['Type'] = (df['Naam'].str.contains('Albert'), 'Supermarkt')

My Idea was that I would be able to reuse this code, with an other part of a string over and over.

But.....

df['Type'] = (df['Naam'].str.contains('Albert'), 'Supermarkt')

gives an error:

Length of values does not match length of index . I think I understand what it means, but can't figure out why the first str.contains() gives a full series and this one gives an error....

So my question is: is there a way to alter df['Type'] = (df['Naam'].str.contains('Albert'), 'Supermarkt') , in a way that 1: True becomes 'Supermarkt' and all the False values stay in place or are replaced by something else?

Thanks in advance. Greetings Jan

# create a selection
boolean_indexer = df['Naam'].str.contains('Albert')

# create your new column 
df.loc[boolean_indexer, 'Type'] = 'Supermarkt'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM