Creating a new column in Dataframe based on multiple lists

Question

I'm trying to create a new column 'BroadCategory' within a dataframe based on whether values within another column called 'Venue Category' within the data occur in specific lists. I have 5 lists that I am using to fill in the values in the new column

For example:

df['BroadCategory'] = np.where(df['VenueCategory'].isin(Bar),'Bar','Other') 
df['BroadCategory'] = np.where(df['VenueCategory'].isin(Museum_ArtGallery),'Museum/Art Gallery','Other')
df['BroadCategory'] = np.where(df['VenueCategory'].isin(Public_Transport),'Public Transport','Other')
df['BroadCategory'] = np.where(df['VenueCategory'].isin(Restaurant_FoodVenue),'Restaurant/Food Venue','Other')

I ultimately want the values in VenueCategory column occurring in the list Bar to be labeled 'Bar' and those occurring in the list Museum_ArtGallery to be labeled 'Museum_ArtGallery', etc. My code above doesn't accomplish this.

I tried this in order to keep the values I had previously filled but it's still overwriting the values I had filled in based on my previous conditions:

df['BroadCategory'] = np.where(df[df.VenueCategory!='Other'].isin(Entertainment_Venue),'Entertainment Venue','Other')

How can I fill the column BoardCategory with the specific values based on whether the values in the VenueCategory column occur in the specified lists Bar, Restaurant, Public_Transport, Museum_ArtGallery, etc?

Answer 1

support your data is like this

df=pd.DataFrame({'VenueCategory':['drink','wine','MOMA','MTA','sushi','Hudson']})
Bar=['drink','wine','alcohol']
Museum_ArtGallery=['MOMA','MCM']
Public_Transport=['MTA','MBTA']
Restaurant_FoodVenue=['sushi','chicken']

prepare a dictionary:

from collections import defaultdict
d=defaultdict(lambda:'other')
d.update({x:'Bar' for x in Bar})
d.update({x:'Museum_ArtGallery' for x in Museum_ArtGallery})
d.update({x:'Public_Transport' for x in Public_Transport})
d.update({x:'Restaurant_FoodVenue' for x in Restaurant_FoodVenue})

build new column and print result:

df['BroadCategory']=df['VenueCategory'].apply(lambda x:d[x])
df

Answer 2

venue_list = [['Bar', Bar],
               ['Museum_ArtGallery',Museum_ArtGallery]
               #etc
               ]
venue_lookup = pd.concat([
    pd.DataFrame({
        'BroadCategory':venue[0],
        'VenueCategory':venue[1]}) for venue in venue_list]
        )
pd.merge(df, venue_lookup, how='left', on = 'VenueCategory')

Answer 3

Your solution is already close. Just that in order not to overwrite previously values, you should get a subset of the rows and only set new values on the subset.

To do that, you can firstly initialize new column BroadCategory to 'Other'. Then set up a subset of rows of each category by subscripting the new column with Boolean mask using the .isin() function like you are using now. The codes are like below:

df['BroadCategory'] = 'Other'
df['BroadCategory'][df['VenueCategory'].isin(Bar)] = 'Bar' 
df['BroadCategory'][df['VenueCategory'].isin(Museum_ArtGallery)] = 'Museum/Art Gallery'
df['BroadCategory'][df['VenueCategory'].isin(Public_Transport)] = 'Public Transport'
df['BroadCategory'][df['VenueCategory'].isin(Restaurant_FoodVenue)] = 'Restaurant/Food Venue'
df['BroadCategory'][df['VenueCategory'].isin(Entertainment_Venue)] = 'Entertainment Venue'

Creating a new column in Dataframe based on multiple lists

Question

3 answers

solution1
0 ACCPTED 2021-04-27 04:53:08

solution2
0 2021-04-27 05:23:03

solution3
0 2021-04-27 05:37:56

Creating a new column in Dataframe based on multiple lists

Question

3 answers

solution1 0 ACCPTED 2021-04-27 04:53:08

solution2 0 2021-04-27 05:23:03

solution3 0 2021-04-27 05:37:56

solution1
0 ACCPTED 2021-04-27 04:53:08

solution2
0 2021-04-27 05:23:03

solution3
0 2021-04-27 05:37:56