简体   繁体   中英

How to fill blank country column with country name in pandas python

I have data frame columns like language, region and country. In that data frame using language column to fill the country with country name.

My input is:

language      region         country

english        a            canada
chinese        b            china
english        a            usa
japanese       a            japan
english        a            usa
portugese      b            portugal
english        a            null    

In above data frame, I want to fill the null country name with by using country names based on count which countries are using English. Let's suppose USA count has 2 and Canada count has 1. So, USA has highest count then we have to fill the USA country name in null place.

Required output should be:

language      region         country

english        a            canada
chinese        b            china
english        a            usa
japanese       a            japan
english        a            usa
portugese      b            portugal
english        a            usa

For above required output I used below code snippet. But it is not working. Can anyone help me for above required output data frame.

df.loc[df['language']=='english' & df['region']='ap' & df['country'].value_counts()[df['country'].value_counts() == df['country'].value_counts().max()]

In above code snippet i must need to be use df.loc[df['language']=='english' & df['region']='ap'.after that i have to find highest country count based on AP region and fill blank country as with highest country count country.

Assume your null is NaN or None . If it is string null , You need pre-process it to NaN

df = df.where(df.ne('null')) # doing this step if your `null` is string `null`

m = df.country.isna()
m1 = df.language.eq('english')

df.loc[m & m1, 'country'] = df.loc[m1, 'country'].mode()[0]

Out[194]:
    language region   country
0    english      a    canada
1    chinese      b     china
2    english      a       usa
3   japanese      a     japan
4    english      a       usa
5  portugese      b  portugal
6    english      a       usa

A more generalized solution would be to map and fillna

d = df.groupby('language').country.apply(lambda s: s.mode()[0]).to_dict() 
df['country'] = df.country.fillna(df.language.map(d))

    language region   country
0    english      a    canada
1    chinese      b     china
2    english      a       usa
3   japanese      a     japan
4    english      a       usa
5  portugese      b  portugal
6    english      a       usa

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM