I'd like to fill missing categorial cells with new values per column. For example:
c1 c2 c3
a nan a
b q nan
c d nan
a p z
should become something like
c1 c2 c3
a n1 a
b q n2
c d n2
a p z
My current problem is that I am using DictVectorizer for categorials column, but it leaves NaNs as-is.
Fillna with some uniq string does what you want:
categorial_data = pd.DataFrame({'sex': ['male', 'female', 'male', 'female'],
'nationality': ['American', 'European', float('nan'), 'European']})
print(categorial_data)
categorial_data=categorial_data.fillna('some_unique_string')
print('after replacement')
print(categorial_data)
encoder = DV(sparse = False)
encoded_data = encoder.fit_transform(categorial_data.T.to_dict().values())
print(encoded_data)
gives you
nationality sex
0 American male
1 European female
2 NaN male
3 European female
after replacement
nationality sex
0 American male
1 European female
2 some_unique_string male
3 European female
[[ 1. 0. 0. 0. 1.]
[ 0. 1. 0. 1. 0.]
[ 0. 0. 1. 0. 1.]
[ 0. 1. 0. 1. 0.]]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.