简体   繁体   中英

Python - add a numpy array as column to a pandas dataframe with different length

I have a pandas dataframe df with multiple columns. One of the columns is Col1 which contains float values or NaNs:

df
+----+------+-----+
| No | Col1 | ... |
+----+------+-----+
| 12 |   10 | ... |
| 23 |  NaN | ... |
| 34 |    5 | ... |
| 45 |  NaN | ... |
| 54 |   22 | ... |
+----+------+-----+

I run a function over Col1 excluding missing values ( NaN ) like this:

StandardScaler().fit_transform(df.loc[pd.notnull(df[Col1]), [Col1]])

Imagine the result is a numpy.ndarray like this:

+-----+
| Ref |
+-----+
|   2 |
|   5 |
|   1 |
+-----+

Notice that this array does not have same length than the original column Col1 .

I need a solution to add the array Ref as a column to df . For each row where Col1 is NaN , the new column Ref gets NaN too. Desired output would look like this:

+----+------+-----+-----+
| No | Col1 | ... | Ref |
+----+------+-----+-----+
| 12 |   10 | ... |   2 |
| 23 |  NaN | ... | NaN |
| 34 |    5 | ... |   5 |
| 45 |  NaN | ... | NaN |
| 54 |   22 | ... |   1 |
+----+------+-----+-----+

I think you can assign to new column filtered by same boolean mask:

from sklearn.preprocessing import StandardScaler

mask = df['Col1'].notnull()
df.loc[mask, 'Ref'] = StandardScaler().fit_transform(df.loc[mask, ['Col1']])
print (df)
   No  Col1       Ref
0  12  10.0 -0.327089
1  23   NaN       NaN
2  34   5.0 -1.027992
3  45   NaN       NaN
4  54  22.0  1.355081

Detail :

print (StandardScaler().fit_transform(df.loc[mask, ['Col1']]))
[[-0.32708852]
 [-1.02799249]
 [ 1.35508101]]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM