How to randomly populate a categorical column in pandas dataframe using pre-defined values

Question

I have two pandas dataframes, first one contains names of more than 50 cities and the second one contains customer details like name, age gender, salary, profession etc. There is no common key between these data frames and their size is also different. I wish to populate a new column named 'Customer City' in the customer details dataframe which should have values chosen from the cities dataframe. In other words for a customer I wish to choose a random city (from the cities dataframe) and add it to a new column named 'Customer City' in the customer dataframe.

Kindly suggest how can this be done in pandas.

Answer 1

Just select them from cities with numpy random choice. Not sure what the cities dataframe looks like, so you might have to change that bit to work with what you have.

import numpy as np

df["Customer City"] = np.random.choice(cities, len(df))

How to randomly populate a categorical column in pandas dataframe using pre-defined values

Question

1 answers

solution1
3 ACCPTED 2020-05-17 17:00:49

How to randomly populate a categorical column in pandas dataframe using pre-defined values

Question

1 answers

solution1 3 ACCPTED 2020-05-17 17:00:49

solution1
3 ACCPTED 2020-05-17 17:00:49