I'm using a pandas series and trying to convert it to one hot encoding. I'm using the describe
method in order to check how many unique categories the series has. The output is:
input['pattern'].describe(include='all')
count 9725
unique 7
top 1
freq 4580
Name: pattern, dtype: object
When I'm trying:
x = pd.get_dummies(input['pattern'])
x.describe(include= 'all')
I get 18 classes with 12 classes which are completely zeros. How come did get_dummies
produced classes which did not occur even once in the input?
From a discussion in the comments, it was deduced that your column contained a mixture of strings and integers.
For example,
s = pd.Series(['0', 0, '0', '6', 6, '6', '3', '3'])
s
0 0
1 0
2 0
3 6
4 6
5 6
6 3
7 3
dtype: object
Now, calling pd.get_dummies
would result in multiple such columns of the same feature.
pd.get_dummies(s)
0 6 0 3 6
0 0 0 1 0 0
1 1 0 0 0 0
2 0 0 1 0 0
3 0 0 0 0 1
4 0 1 0 0 0
5 0 0 0 0 1
6 0 0 0 1 0
7 0 0 0 1 0
The fix is to ensure that all elements are of the same type. I'd recommend, for this case, converting to str
.
s.astype(str).str.get_dummies()
0 3 6
0 1 0 0
1 1 0 0
2 1 0 0
3 0 0 1
4 0 0 1
5 0 0 1
6 0 1 0
7 0 1 0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.