简体   繁体   English

Python-将numpy数组作为列添加到具有不同长度的pandas数据帧

[英]Python - add a numpy array as column to a pandas dataframe with different length

I have a pandas dataframe df with multiple columns. 我有一个多列的pandas dataframe df One of the columns is Col1 which contains float values or NaNs: 列之一是Col1 ,其中包含浮点值或NaN:

df
+----+------+-----+
| No | Col1 | ... |
+----+------+-----+
| 12 |   10 | ... |
| 23 |  NaN | ... |
| 34 |    5 | ... |
| 45 |  NaN | ... |
| 54 |   22 | ... |
+----+------+-----+

I run a function over Col1 excluding missing values ( NaN ) like this: 我在Col1运行了一个函数,排除了像这样的缺失值( NaN ):

StandardScaler().fit_transform(df.loc[pd.notnull(df[Col1]), [Col1]])

Imagine the result is a numpy.ndarray like this: 想象一下结果是一个像这样的numpy.ndarray:

+-----+
| Ref |
+-----+
|   2 |
|   5 |
|   1 |
+-----+

Notice that this array does not have same length than the original column Col1 . 请注意,此数组的长度与原始列Col1长度不同。

I need a solution to add the array Ref as a column to df . 我需要一种将Ref列添加为df的解决方案。 For each row where Col1 is NaN , the new column Ref gets NaN too. 对于Col1NaN每一行,新列Ref也会获得NaN Desired output would look like this: 所需的输出如下所示:

+----+------+-----+-----+
| No | Col1 | ... | Ref |
+----+------+-----+-----+
| 12 |   10 | ... |   2 |
| 23 |  NaN | ... | NaN |
| 34 |    5 | ... |   5 |
| 45 |  NaN | ... | NaN |
| 54 |   22 | ... |   1 |
+----+------+-----+-----+

I think you can assign to new column filtered by same boolean mask: 我认为您可以分配给由相同布尔掩码过滤的新列:

from sklearn.preprocessing import StandardScaler

mask = df['Col1'].notnull()
df.loc[mask, 'Ref'] = StandardScaler().fit_transform(df.loc[mask, ['Col1']])
print (df)
   No  Col1       Ref
0  12  10.0 -0.327089
1  23   NaN       NaN
2  34   5.0 -1.027992
3  45   NaN       NaN
4  54  22.0  1.355081

Detail : 详细说明

print (StandardScaler().fit_transform(df.loc[mask, ['Col1']]))
[[-0.32708852]
 [-1.02799249]
 [ 1.35508101]]

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Pandas:需要从numpy数组中添加一个新列,但是长度比数据帧的长度长 - Pandas: need to add a new column from a numpy array, but the length is longer than the dataframe's length Pandas:将系列添加到数据框作为列(相同的索引,不同的长度) - Pandas: Add series to dataframe as a column (same index, different length) 将较短长度的Numpy数组连接到Pandas数据框 - Joining Shorter Length Numpy Array to Pandas Dataframe Python:用一列numpy数组填充pandas数据帧的一行 - Python: fill a row of a pandas dataframe with a column of an numpy array Python:将一列添加到 dataframe 中,具有不同的长度重复添加的列直到填充 dataframe 长度 - Python : Add a column into a dataframe with different length repeating the added column till fill the dataframe length 如何在 python 中向 dataframe 中添加不同列长度的页脚(行)? - How to add a footer(row) to the dataframe with different column length in python? Pandas/Numpy - 如何在与索引长度匹配的数据帧列中添加重复出现的整数 - Pandas/Numpy - how to add reoccurring integers in a dataframe column that matches the length of the index 熊猫中是否有一种方法可以在一个数据帧中计数(Excel中的Countifs)并在另一个长度不同的数据帧中将计数添加为新列? - Is there a way in Pandas to count (Countifs in excel) in one dataframe and add counts as new column in another dataframe of different length? 熊猫数据框读取numpy数组列为str - Pandas dataframe reading numpy array column as str 将Pandas Dataframe行和列转换为Numpy数组 - Convert Pandas Dataframe Row and Column to Numpy Array
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM