如何规范化我的熊猫数据框中一系列列中的数据

Question

假设我有一个熊猫数据框surveyData：

我想通过执行以下操作来规范化每列中的数据：

surveyData_norm = (surveyData - surveyData.mean()) / (surveyData.max() - surveyData.min())

如果我的数据表只包含我想要规范化的列，这将正常工作。 但是，我有一些列包含前面的字符串数据，例如：

Name  State  Gender  Age  Income  Height
Sam   CA     M        13   10000    70
Bob   AZ     M        21   25000    55
Tom   FL     M        30   100000   45

我只想标准化年龄、收入和身高列，但我的上述方法不起作用，因为名称状态和性别列中的字符串数据。

Answer 1

您可以通过多种方式对 pandas 中的一组行或列执行操作。 一种有用的方法是索引：

# Assuming same lines from your example
cols_to_norm = ['Age','Height']
survey_data[cols_to_norm] = survey_data[cols_to_norm].apply(lambda x: (x - x.min()) / (x.max() - x.min()))

这将仅将其应用于您想要的列并将结果分配回这些列。 或者，您可以将它们设置为新的规范化列，并根据需要保留原件。

Answer 2

我认为在这种情况下最好使用“sklearn.preprocessing”，它可以为我们提供更多的缩放选项。 在您使用 StandardScaler 的情况下，这样做的方法是：

from sklearn.preprocessing import StandardScaler
cols_to_norm = ['Age','Height']
surveyData[cols_to_norm] = StandardScaler().fit_transform(surveyData[cols_to_norm])

Answer 3

简单的方法和更有效的方法：
预先计算平均值：
dropna()避免丢失数据。

mean_age = survey_data.Age.dropna().mean()
max_age = survey_data.Age.dropna().max()
min_age = survey_data.Age.dropna().min()

dataframe['Age'] = dataframe['Age'].apply(lambda x: (x - mean_age ) / (max_age -min_age ))

这种方法会奏效...

Answer 4

我认为使用内置函数真的很好

# Assuming same lines from your example
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
cols_to_norm = ['Age','Height']
survey_data[cols_to_norm] = scaler.fit_transform(survey_data[cols_to_norm])

Answer 5

MinMax 使用minmax_scale规范化所有数值列

import numpy as np
from sklearn.preprocessing import minmax_scale
# cols = ['Age', 'Height']
cols = df.select_dtypes(np.number).columns
df[cols] = minmax_scale(df[cols])

注意：保持索引、列名或非数值变量不变。

Answer 6

import pandas as pd
import numpy as np
# let Dataset here be your data#

from sklearn.preprocessing import MinMaxScaler
minmax = MinMaxScaler()

for x in dataset.columns[dataset.dtypes == 'int64']:
    Dataset[x] = minmax.fit_transform(np.array(Dataset[I]).reshape(-1,1))

如何规范化我的熊猫数据框中一系列列中的数据

问题描述

6 个解决方案

解决方案1
39 已采纳 2015-02-18 06:38:28

解决方案2
9 2018-10-23 09:51:13

解决方案3
3 2016-08-31 22:20:40

解决方案4
0 2022-02-02 19:01:07

解决方案5
0 2022-05-30 21:59:32

解决方案6
-1 2019-11-27 16:15:36

如何规范化我的熊猫数据框中一系列列中的数据

问题描述

6 个解决方案

解决方案1 39 已采纳 2015-02-18 06:38:28

解决方案2 9 2018-10-23 09:51:13

解决方案3 3 2016-08-31 22:20:40

解决方案4 0 2022-02-02 19:01:07

解决方案5 0 2022-05-30 21:59:32

解决方案6 -1 2019-11-27 16:15:36

解决方案1
39 已采纳 2015-02-18 06:38:28

解决方案2
9 2018-10-23 09:51:13

解决方案3
3 2016-08-31 22:20:40

解决方案4
0 2022-02-02 19:01:07

解决方案5
0 2022-05-30 21:59:32

解决方案6
-1 2019-11-27 16:15:36