python pandas 使用 keras 规范化列，然后拆分为组

Question

Having the following data frame (actual data frame contains multiple strings and numeric columns):具有以下数据框（实际数据框包含多个字符串和数字列）：

col1    col2
0   A   10
1   A   10
2   B   5
3   B   5

I want to normalize the data based on column values so the result would look like this:我想根据列值对数据进行规范化，因此结果如下所示：

    col1    col2
0   A           0.632456
1   A           0.632456
2   B           0.316228
3   B           0.316228

And then split it to groups to get:然后将其拆分为组以获得：

    col1    col2
0   A           0.632456
1   A           0.632456

    col1    col2
0   B           0.316228
1   B           0.316228

Splitting to groups is easy however I'm struggling with the normalization.拆分为组很容易，但是我正在努力实现标准化。 I've tried using the following code:我尝试使用以下代码：

from keras.utils import normalize
df = pd.DataFrame({"col1":["A","A","B","B"],"col2":[10,10,5,5]})
normalize(df, axis=0)

But since I have strings it fails, it will work if the values of A and B would be numeric.但由于我有字符串，它会失败，如果 A 和 B 的值是数字，它将起作用。

Q: How can I normalize the numeric values by columns without dropping the string columns so I can later group by?问：如何在不删除字符串列的情况下按列标准化数值，以便以后可以分组？

Answer 1

When dealing with categorical data, you should be looking at encoding methods such as a OneHotEncoder .在处理分类数据时，您应该查看诸如OneHotEncoder类的编码方法。 It doesn't make sense to try to normalize these columns directly.尝试直接对这些列进行规范化是没有意义的。 In this case, you could use a scaler such as MinMaxScaler for the numerical columns (or keras' Normalize ), and then one hot encode the categorical columns as:在这种情况下，您可以对数值列（或 keras 的Normalize ）使用诸如MinMaxScaler之类的缩放器，然后将分类列热编码为：

from sklearn.preprocessing import MinMaxScaler, OneHotEncoder

sc = MinMaxScaler()
oh = OneHotEncoder()

col2_norm = sc.fit_transform(df.col2.to_numpy()[:,None])
col1_one_hot = oh.fit_transform(df.col1.to_numpy()[:,None]).toarray()

np.concatenate([col1_one_hot, col2_norm], axis=1)
array([[1., 0., 1.],
       [1., 0., 1.],
       [0., 1., 0.],
       [0., 1., 0.]])

If you just want to normalize the categorical column, you can just feed a Series to the scaler, rather than the entire dataframe:如果您只想规范化分类列，您可以将一个Series提供给缩放器，而不是整个 dataframe：

sc = MinMaxScaler()
df['col2'] = sc.fit_transform(df.col2.to_numpy()[:,None])

Or similarly with keras' normalize :或者与 keras 的normalize类似：

df['col2'] = normalize(df.col2.to_numpy()).squeeze()

print(df)

  col1  col2
0    A   1.0
1    A   1.0
2    B   0.0
3    B   0.0

python pandas 使用 keras 规范化列，然后拆分为组

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-04-22 08:23:43

python pandas 使用 keras 规范化列，然后拆分为组

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-04-22 08:23:43

解决方案1
1 已采纳 2020-04-22 08:23:43