[英]Fill missing categorial values using pandas?
I'd like to fill missing categorial cells with new values per column. 我想用每列新值填充缺少的分类单元格。 For example:
例如:
c1 c2 c3
a nan a
b q nan
c d nan
a p z
should become something like 应该变得像
c1 c2 c3
a n1 a
b q n2
c d n2
a p z
My current problem is that I am using DictVectorizer for categorials column, but it leaves NaNs as-is. 我目前的问题是我使用DictVectorizer作为分类列,但它按原样保留NaN。
Fillna with some uniq string does what you want: Fillna有一些uniq字符串可以满足您的需求:
categorial_data = pd.DataFrame({'sex': ['male', 'female', 'male', 'female'],
'nationality': ['American', 'European', float('nan'), 'European']})
print(categorial_data)
categorial_data=categorial_data.fillna('some_unique_string')
print('after replacement')
print(categorial_data)
encoder = DV(sparse = False)
encoded_data = encoder.fit_transform(categorial_data.T.to_dict().values())
print(encoded_data)
gives you 给你
nationality sex
0 American male
1 European female
2 NaN male
3 European female
after replacement
nationality sex
0 American male
1 European female
2 some_unique_string male
3 European female
[[ 1. 0. 0. 0. 1.]
[ 0. 1. 0. 1. 0.]
[ 0. 0. 1. 0. 1.]
[ 0. 1. 0. 1. 0.]]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.