Pandas dataframe 列来自包含 NaN 值的数据

Question

以下代码将给定的 pandas 列FEAT转换为名为STREAM的新二进制特征。 只要原始 dataframe 中没有 NaN 值，该程序就可以工作。 如果是这种情况，则会发生以下异常： ValueError: Length of values does not match length of index 。 我需要将 NaN 值推送到新列。 可行吗？ 这是失败的代码选项：

import pandas as pd
import numpy as np
data = {
    'FEAT': [8, 15, 7, np.nan, 5, 2, 11, 15]
}
customer = pd.DataFrame(data)
customer = pd.DataFrame(data, index=['June', 'Robert', 'Lily', 'David', 'Bob', 'Sally', 'Mia', 'Luis'])
#create binary variable STREAM 0:mainstream 1:avantgarde
stream_0 = [1, 3, 5, 8, 10, 12, 14]
stream_1 = [2, 4, 6, 7, 9, 11, 13, 15]
# convert FEAT to list_0
list_0 = customer['FEAT'].values.tolist()
# create a list of length = len(customer) whose elements are:
#  0 if the value of 'FEAT' is in stream_0
#  1 if the value of 'FEAT' is in stream_1
L = []
for i in list_0:
    if i in stream_0:
        L.append(0)
    elif i in stream_1:
        L.append(1)
# convert the list to a new column of customer df
customer['STREAM'] = L
print(customer)

Answer 1

问题是您缺少一个else块，因此当一个值（如NaN ）既不在stream_0也不在stream_1中时，您什么也不做，这会导致 L 的元素少于customer中的行数。

这里不需要循环， np.select可以处理列的创建。 default参数将处理else块。

customer['STREAM'] = np.select([customer.FEAT.isin(stream_0), customer.FEAT.isin(stream_1)],
                                [0, 1], default=np.NaN)

        FEAT  STREAM
June     8.0     0.0
Robert  15.0     1.0
Lily     7.0     1.0
David    NaN     NaN
Bob      5.0     0.0
Sally    2.0     1.0
Mia     11.0     1.0
Luis    15.0     1.0

您也可以 map 几个值，不在其中的所有值都是NaN

d = {key: value for l,value in zip([stream_0, stream_1], [0,1]) for key in l}
customer['STREAM'] = customer['FEAT'].map(d)

dict 使用理解来创建键值对。 对于stream_0中的每个键，我们为其分配一个值0 ，对于stream_1中的每个键，我们为其分配一个值1 。 理解有点复杂，所以更容易理解的方法是分别创建每个字典，然后将它们组合起来。

d_1 = {k: 0 for k in stream_0}
d_2 = {k: 1 for k in stream_1}
d = {**d_1, **d_2}  # Combine
#{1: 0, 2: 1, 3: 0, 4: 1, 5: 0, 6: 1, 7: 1,
# 8: 0, 9: 1, 10: 0, 11: 1, 12: 0, 13: 1, 14: 0, 15: 1}

Pandas dataframe 列来自包含 NaN 值的数据

问题描述

1 个解决方案

解决方案1
2 已采纳 2020-05-13 15:52:20

Pandas dataframe 列来自包含 NaN 值的数据

问题描述

1 个解决方案

解决方案1 2 已采纳 2020-05-13 15:52:20

解决方案1
2 已采纳 2020-05-13 15:52:20