[英]How to create a new data frame
傳遞的值的形狀為(1000,10),索引表示為(1000,11)
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(df.drop('TARGET CLASS',axis=1))
scaled_features = scaler.transform(df.drop('TARGET CLASS',axis=1))
df_feat = pd.DataFrame(scaled_features,columns=df.columns)
錯誤,
傳遞的值的形狀為(1000,10),索引表示為(1000,11)
發生在這條線上
df_feat = pd.DataFrame(scaled_features,columns=df.columns)
因為scaled_features
具有10列,但df.columns
長度為11。
注意, df.drop('TARGET CLASS',axis=1)
來從df
刪除TARGET CLASS
列。 似乎這是您希望從新列列表中刪除的df
中的額外列。
通過保存對df.drop('TARGET CLASS',axis=1)
的引用(將其稱為df_minus_target
),並將df_minus_target.columns
作為新的列列表,可以解決此問題:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
df_minus_target = df.drop('TARGET CLASS',axis=1)
scaler.fit(df_minus_target)
scaled_features = scaler.transform(df_minus_target)
df_feat = pd.DataFrame(scaled_features,columns=df_minus_target.columns)
您在提取列以創建df_feat
數據df_feat
應該忘記從df中刪除最后一列(應為pd.DataFrame(scaled_features,columns=df.drop('TARGET CLASS',axis=1).columns)
,請參見下面的整個可重現示例:
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
# Mock your dataset:
df = pd.DataFrame(np.random.rand(5, 10))
df = pd.concat([df, pd.Series([1, 1, 0, 0, 1], name='TARGET CLASS')], axis=1)
scaler = StandardScaler()
scaler.fit(df.drop('TARGET CLASS',axis=1))
scaled_features = scaler.transform(df.drop('TARGET CLASS',axis=1))
df_feat = pd.DataFrame(scaled_features,columns=df.drop('TARGET CLASS',axis=1).columns)
print(df_feat)
或為了防止將來發生此類錯誤-首先將要使用的功能列提取到單獨的數據框中:
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
# Mock your dataset:
df = pd.DataFrame(np.random.rand(5, 10))
df = pd.concat([df, pd.Series([1, 1, 0, 0, 1], name='TARGET CLASS')], axis=1)
# Extract raw features columns first.
df_feat = df.drop('TARGET CLASS', axis=1)
# Do transformations.
scaler = StandardScaler()
scaler.fit(df_feat)
scaled_features = scaler.transform(df_feat)
df_feat_scaled = pd.DataFrame(scaled_features, columns=df_feat.columns)
print(df_feat_scaled)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.