[英]Fill nan values with random value from another DataFrame pandas
I have a DataFrame with millon of rows and a lot of NaN values. 我有一个数百万行和许多NaN值的DataFrame。 Some example:
一些例子:
index Company Area
0 Google Technology
1 Coca Cola Drinks
2 NaN Drinks
3 Apple Technology
4 NaN Technology
5 Gatorade Drinks
6 Dell Technology
7 Apple Technology
8 Coca Cola Drinks
9 NaN Drinks
10 Google Technology
My idea is to fill Companies NaN values with one of the 2 most common values for its Area. 我的想法是用其Area的2个最常见的值之一填充Companies NaN值。
From example: If the most frequent Companies in Technology area are Apple and Google, I Would like to fill the "df['Area'] == 'Technology'" NaN values with one of that values (randomly) 例如:如果技术领域中使用频率最高的公司是Apple和Google,我想用其中一个值(随机)填充“ df ['Area'] =='Technology'” NaN值
I've already created a Group By DataFrame with the most common values, it is something like this: 我已经用最常见的值创建了一个Group By DataFrame,它是这样的:
Area Company
Technology Google
Technology Apple
Drinks Coca Cola
Drinks Pepsi
The result should be something like this: 结果应该是这样的:
index Company Area
0 Google Technology
1 Coca Cola Drinks
2 Pepsi Drinks
3 Apple Technology
4 Google Technology
5 Gatorade Drinks
6 Dell Technology
7 Apple Technology
8 Coca Cola Drinks
9 Pepsi Drinks
10 Google Technology
I hope you can help me. 我希望你能帮助我。
Thanks!!! 谢谢!!!
I come up with this solution by using random.choice
我通过使用
random.choice
提出了这个解决方案
import random
s=df1.groupby('Area').Company.apply(list).reindex(df.Area).apply(lambda x :random.choice(x) )
s.index=df.index
df.Company=df.Company.fillna(s)
df
Out[200]:
index Company Area
0 0 Google Technology
1 1 CocaCola Drinks
2 2 CocaCola Drinks
3 3 Apple Technology
4 4 Google Technology
5 5 Gatorade Drinks
6 6 Dell Technology
7 7 Apple Technology
8 8 CocaCola Drinks
9 9 Pepsi Drinks
10 10 Google Technology
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.