[英]python pandas normalize column with keras, then splitting to groups
Having the following data frame (actual data frame contains multiple strings and numeric columns):具有以下数据框(实际数据框包含多个字符串和数字列):
col1 col2
0 A 10
1 A 10
2 B 5
3 B 5
I want to normalize the data based on column values so the result would look like this:我想根据列值对数据进行规范化,因此结果如下所示:
col1 col2
0 A 0.632456
1 A 0.632456
2 B 0.316228
3 B 0.316228
And then split it to groups to get:然后将其拆分为组以获得:
col1 col2
0 A 0.632456
1 A 0.632456
col1 col2
0 B 0.316228
1 B 0.316228
Splitting to groups is easy however I'm struggling with the normalization.拆分为组很容易,但是我正在努力实现标准化。 I've tried using the following code:
我尝试使用以下代码:
from keras.utils import normalize
df = pd.DataFrame({"col1":["A","A","B","B"],"col2":[10,10,5,5]})
normalize(df, axis=0)
But since I have strings it fails, it will work if the values of A and B would be numeric.但由于我有字符串,它会失败,如果 A 和 B 的值是数字,它将起作用。
Q: How can I normalize the numeric values by columns without dropping the string columns so I can later group by?问:如何在不删除字符串列的情况下按列标准化数值,以便以后可以分组?
When dealing with categorical data, you should be looking at encoding methods such as a OneHotEncoder
.在处理分类数据时,您应该查看诸如
OneHotEncoder
类的编码方法。 It doesn't make sense to try to normalize these columns directly.尝试直接对这些列进行规范化是没有意义的。 In this case, you could use a scaler such as
MinMaxScaler
for the numerical columns (or keras' Normalize
), and then one hot encode the categorical columns as:在这种情况下,您可以对数值列(或 keras 的
Normalize
)使用诸如MinMaxScaler
之类的缩放器,然后将分类列热编码为:
from sklearn.preprocessing import MinMaxScaler, OneHotEncoder
sc = MinMaxScaler()
oh = OneHotEncoder()
col2_norm = sc.fit_transform(df.col2.to_numpy()[:,None])
col1_one_hot = oh.fit_transform(df.col1.to_numpy()[:,None]).toarray()
np.concatenate([col1_one_hot, col2_norm], axis=1)
array([[1., 0., 1.],
[1., 0., 1.],
[0., 1., 0.],
[0., 1., 0.]])
If you just want to normalize the categorical column, you can just feed a Series
to the scaler, rather than the entire dataframe:如果您只想规范化分类列,您可以将一个
Series
提供给缩放器,而不是整个 dataframe:
sc = MinMaxScaler()
df['col2'] = sc.fit_transform(df.col2.to_numpy()[:,None])
Or similarly with keras' normalize
:或者与 keras 的
normalize
类似:
df['col2'] = normalize(df.col2.to_numpy()).squeeze()
print(df)
col1 col2
0 A 1.0
1 A 1.0
2 B 0.0
3 B 0.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.