[英]How to fill blank country column with country name in pandas python
I have data frame columns like language, region and country.我有数据框列,如语言、地区和国家。 In that data frame using language column to fill the country with country name.
在该数据框中使用语言列用国家/地区名称填充国家/地区。
My input is:我的输入是:
language region country
english a canada
chinese b china
english a usa
japanese a japan
english a usa
portugese b portugal
english a null
In above data frame, I want to fill the null country name with by using country names based on count which countries are using English.在上面的数据框中,我想根据使用英语的国家/地区的计数使用国家/地区名称来填充空国家/地区名称。 Let's suppose USA count has 2 and Canada count has 1. So, USA has highest count then we have to fill the USA country name in null place.
假设美国计数为 2,加拿大计数为 1。因此,美国计数最高,那么我们必须在空位置填写美国国家/地区名称。
Required output should be:所需的输出应该是:
language region country
english a canada
chinese b china
english a usa
japanese a japan
english a usa
portugese b portugal
english a usa
For above required output I used below code snippet.对于上面所需的输出,我使用了下面的代码片段。 But it is not working.
但它不起作用。 Can anyone help me for above required output data frame.
任何人都可以帮助我获得上述所需的输出数据框。
df.loc[df['language']=='english' & df['region']='ap' & df['country'].value_counts()[df['country'].value_counts() == df['country'].value_counts().max()]
In above code snippet i must need to be use df.loc[df['language']=='english' & df['region']='ap'.after that i have to find highest country count based on AP region and fill blank country as with highest country count country.在上面的代码片段中,我必须使用 df.loc[df['language']=='english' & df['region']='ap'.after 之后我必须根据 AP 区域找到最高的国家/地区数并填写空白国家作为最高国家计数国家。
Assume your null
is NaN
or None
.假设您的
null
是NaN
或None
。 If it is string null
, You need pre-process it to NaN
如果它是 string
null
,则需要将其预处理为NaN
df = df.where(df.ne('null')) # doing this step if your `null` is string `null`
m = df.country.isna()
m1 = df.language.eq('english')
df.loc[m & m1, 'country'] = df.loc[m1, 'country'].mode()[0]
Out[194]:
language region country
0 english a canada
1 chinese b china
2 english a usa
3 japanese a japan
4 english a usa
5 portugese b portugal
6 english a usa
A more generalized solution would be to map
and fillna
更通用的解决方案是
map
和fillna
d = df.groupby('language').country.apply(lambda s: s.mode()[0]).to_dict()
df['country'] = df.country.fillna(df.language.map(d))
language region country
0 english a canada
1 chinese b china
2 english a usa
3 japanese a japan
4 english a usa
5 portugese b portugal
6 english a usa
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.