簡體   English   中英

如何創建一個新的數據框

[英]How to create a new data frame

傳遞的值的形狀為(1000,10),索引表示為(1000,11)

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(df.drop('TARGET CLASS',axis=1))
scaled_features = scaler.transform(df.drop('TARGET CLASS',axis=1))
df_feat = pd.DataFrame(scaled_features,columns=df.columns)

錯誤,

傳遞的值的形狀為(1000,10),索引表示為(1000,11)

發生在這條線上

df_feat = pd.DataFrame(scaled_features,columns=df.columns)

因為scaled_features具有10列,但df.columns長度為11。


注意, df.drop('TARGET CLASS',axis=1)來從df刪除TARGET CLASS列。 似乎這是您希望從新列列表中刪除的df中的額外列。

通過保存對df.drop('TARGET CLASS',axis=1)的引用(將其稱為df_minus_target ),並將df_minus_target.columns作為新的列列表,可以解決此問題:

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
df_minus_target = df.drop('TARGET CLASS',axis=1)
scaler.fit(df_minus_target)
scaled_features = scaler.transform(df_minus_target)
df_feat = pd.DataFrame(scaled_features,columns=df_minus_target.columns)

您在提取列以創建df_feat數據df_feat應該忘記從df中刪除最后一列(應為pd.DataFrame(scaled_features,columns=df.drop('TARGET CLASS',axis=1).columns) ,請參見下面的整個可重現示例:

import pandas as pd
import numpy as np

from sklearn.preprocessing import StandardScaler

# Mock your dataset:
df = pd.DataFrame(np.random.rand(5, 10))
df = pd.concat([df, pd.Series([1, 1, 0, 0, 1], name='TARGET CLASS')], axis=1)

scaler = StandardScaler()
scaler.fit(df.drop('TARGET CLASS',axis=1))
scaled_features = scaler.transform(df.drop('TARGET CLASS',axis=1))
df_feat = pd.DataFrame(scaled_features,columns=df.drop('TARGET CLASS',axis=1).columns)

print(df_feat)

或為了防止將來發生此類錯誤-首先將要使用的功能列提取到單獨的數據框中:

import pandas as pd
import numpy as np

from sklearn.preprocessing import StandardScaler

# Mock your dataset:
df = pd.DataFrame(np.random.rand(5, 10))
df = pd.concat([df, pd.Series([1, 1, 0, 0, 1], name='TARGET CLASS')], axis=1)

# Extract raw features columns first.
df_feat = df.drop('TARGET CLASS', axis=1)

# Do transformations.
scaler = StandardScaler()
scaler.fit(df_feat)
scaled_features = scaler.transform(df_feat)
df_feat_scaled = pd.DataFrame(scaled_features, columns=df_feat.columns)

print(df_feat_scaled)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM