[英]Python - add a numpy array as column to a pandas dataframe with different length
I have a pandas dataframe df
with multiple columns. 我有一个多列的pandas dataframe
df
。 One of the columns is Col1
which contains float values or NaNs: 列之一是
Col1
,其中包含浮点值或NaN:
df
+----+------+-----+
| No | Col1 | ... |
+----+------+-----+
| 12 | 10 | ... |
| 23 | NaN | ... |
| 34 | 5 | ... |
| 45 | NaN | ... |
| 54 | 22 | ... |
+----+------+-----+
I run a function over Col1
excluding missing values ( NaN
) like this: 我在
Col1
运行了一个函数,排除了像这样的缺失值( NaN
):
StandardScaler().fit_transform(df.loc[pd.notnull(df[Col1]), [Col1]])
Imagine the result is a numpy.ndarray like this: 想象一下结果是一个像这样的numpy.ndarray:
+-----+
| Ref |
+-----+
| 2 |
| 5 |
| 1 |
+-----+
Notice that this array does not have same length than the original column Col1
. 请注意,此数组的长度与原始列
Col1
长度不同。
I need a solution to add the array Ref
as a column to df
. 我需要一种将
Ref
列添加为df
的解决方案。 For each row where Col1
is NaN
, the new column Ref
gets NaN
too. 对于
Col1
为NaN
每一行,新列Ref
也会获得NaN
。 Desired output would look like this: 所需的输出如下所示:
+----+------+-----+-----+
| No | Col1 | ... | Ref |
+----+------+-----+-----+
| 12 | 10 | ... | 2 |
| 23 | NaN | ... | NaN |
| 34 | 5 | ... | 5 |
| 45 | NaN | ... | NaN |
| 54 | 22 | ... | 1 |
+----+------+-----+-----+
I think you can assign to new column filtered by same boolean mask: 我认为您可以分配给由相同布尔掩码过滤的新列:
from sklearn.preprocessing import StandardScaler
mask = df['Col1'].notnull()
df.loc[mask, 'Ref'] = StandardScaler().fit_transform(df.loc[mask, ['Col1']])
print (df)
No Col1 Ref
0 12 10.0 -0.327089
1 23 NaN NaN
2 34 5.0 -1.027992
3 45 NaN NaN
4 54 22.0 1.355081
Detail : 详细说明 :
print (StandardScaler().fit_transform(df.loc[mask, ['Col1']]))
[[-0.32708852]
[-1.02799249]
[ 1.35508101]]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.