[英]Adding StandardScaler() of values as new column to DataFrame returns partly NaNs
I have a pandas DataFrame:我有一个熊猫数据帧:
df['total_price'].describe()
returns返回
count 24895.000000
mean 216.377369
std 161.246931
min 0.000000
25% 109.900000
50% 174.000000
75% 273.000000
max 1355.900000
Name: total_price, dtype: float64
When I apply preprocessing.StandardScaler()
to it:当我对其应用
preprocessing.StandardScaler()
时:
x = df[['total_price']]
standard_scaler = preprocessing.StandardScaler()
x_scaled = standard_scaler.fit_transform(x)
df['new_col'] = pd.DataFrame(x_scaled)
<y new column with the standardized values contains some NaN
s: <y 具有标准化值的新列包含一些
NaN
:
df[['total_price', 'new_col']].head()
total_price new_col
0 241.95 0.158596
1 241.95 0.158596
2 241.95 0.158596
3 81.95 -0.833691
4 81.95 -0.833691
df[['total_price', 'new_col']].tail()
total_price new_col
28167 264.0 NaN
28168 264.0 NaN
28176 94.0 NaN
28177 166.0 NaN
28178 166.0 NaN
What's going wrong here?这里出了什么问题?
The indices in your dataframe have gaps:数据框中的索引存在差距:
28167
28168
28176
28177
28178
When you call pd.DataFrame(x_scaled)
you are creating a new contiguous index and hence when assigining this as a column in the original dataframe, many lines will not have a match.当您调用
pd.DataFrame(x_scaled)
您正在创建一个新的连续索引,因此在将其作为原始数据帧中的一列进行分配时,许多行将不匹配。 You can resolve this by resetting the index in the original dataframe ( df.reset_index()
) or by updating x
inplace ( x.update(x_scaled)
).您可以通过重置原始数据帧中的索引 (
df.reset_index()
) 或通过就地更新x
( x.update(x_scaled)
) 来解决此问题。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.