I'm new to Pandas and I would like to play with random text data. I am trying to add 2 new columns to a DataFrame df which would be each filled by a key (newcol1) + value (newcol2) randomly selected from a dictionary.
countries = {'Africa':'Ghana','Europe':'France','Europe':'Greece','Asia':'Vietnam','Europe':'Lithuania'}
My df already has 2 columns and I'd like something like this :
Year Approved Continent Country
0 2016 Yes Africa Ghana
1 2016 Yes Europe Lithuania
2 2017 No Europe Greece
I can certainly use a for or while loop to fill df['Continent'] and df['Country'] but I sense .apply() and np.random.choice may provide a simpler more pandorable solution for that.
Yep, you're right. You can use np.random.choice
with map
:
df
Year Approved
0 2016 Yes
1 2016 Yes
2 2017 No
df['Continent'] = np.random.choice(list(countries), len(df))
df['Country'] = df['Continent'].map(countries)
df
Year Approved Continent Country
0 2016 Yes Africa Ghana
1 2016 Yes Asia Vietnam
2 2017 No Europe Lithuania
You choose len(df)
number of keys at random from the country
key-list, and then use the country
dictionary as a mapper to find the country equivalents of the previously picked keys.
You could also try using DataFrame.sample()
:
df.join(
pd.DataFrame(list(countries.items()), columns=["continent", "country"])
.sample(len(df), replace=True)
.reset_index(drop=True)
)
Which can be made faster if your continent-country map is already a dataframe.
If you're on Python 3.6, another method would be to use random.choices()
:
df.join(
pd.DataFrame(choices([*countries.items()], k=len(df)), columns=["continent", "country"])
)
random.choices()
is similar to numpy.random.choice()
except that you can pass a list of key-value tuple pairs whereas numpy.random.choice()
only accepts 1-D arrays.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.