如何在 pandas 中用空列表 [] 填充 dataframe Nan 值？

Question

This is my dataframe:这是我的 dataframe：

          date                          ids
0     2011-04-23  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,...
1     2011-04-24  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,...
2     2011-04-25  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,...
3     2011-04-26  Nan
4     2011-04-27  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,...
5     2011-04-28  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,...

I want to replace Nan with [].我想用 [] 替换Nan 。 How to do that?怎么做？ Fillna([]) did not work. Fillna([]) 不起作用。 I even tried replace(np.nan, []) but it gives error:我什至尝试了replace(np.nan, [])但它给出了错误：

 TypeError('Invalid "to_replace" type: \'float\'',)

Answer 1

My approach is similar to @hellpanderrr's, but instead tests for list-ness rather than using isnan :我的方法与@hellpanderrr 的方法类似，但是测试列表而不是使用isnan ：

df['ids'] = df['ids'].apply(lambda d: d if isinstance(d, list) else [])

I originally tried using pd.isnull (or pd.notnull ) but, when given a list, that returns the null-ness of each element.我最初尝试使用pd.isnull （或pd.notnull ）但是，当给定一个列表时，它返回每个元素的空值。

Answer 2

After a lot of head-scratching I found this method that should be the most efficient (no looping, no apply), just assigning to a slice:经过大量的头疼后，我发现这种方法应该是最有效的（没有循环，没有应用），只需分配给一个切片：

isnull = df.ids.isnull()

df.loc[isnull, 'ids'] = [ [[]] * isnull.sum() ]

The trick was to construct your list of [] of the right size ( isnull.sum() ), and then enclose it in a list: the value you are assigning is a 2D array (1 column, isnull.sum() rows) containing empty lists as elements.诀窍是构建正确大小的[]列表（ isnull.sum() ），然后将其包含在一个列表中：您分配的值是一个二维数组（1 列， isnull.sum()行）包含空列表作为元素。

Answer 3

A simple solution would be:一个简单的解决方案是：

df['ids'].fillna("").apply(list)

As noted by @timgeb, this requires df['ids'] to contain lists or nan only.正如@timgeb 所指出的，这要求df['ids']仅包含列表或 nan。

Answer 4

You can first use loc to locate all rows that have a nan in the ids column, and then loop through these rows using at to set their values to an empty list:您可以首先使用loc定位在ids列中具有nan的所有行，然后使用at循环遍历这些行以将它们的值设置为空列表：

for row in df.loc[df.ids.isnull(), 'ids'].index:
    df.at[row, 'ids'] = []

>>> df
        date                                             ids
0 2011-04-23  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
1 2011-04-24  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
2 2011-04-25  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
3 2011-04-26                                              []
4 2011-04-27  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
5 2011-04-28  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]

Answer 5

Surprisingly, passing a dict with empty lists as values seems to work for Series.fillna , but not DataFrame.fillna - so if you want to work on a single column you can use this:令人惊讶的是，将带有空列表的字典作为值传递似乎适用于Series.fillna ，但不适用于DataFrame.fillna - 所以如果你想处理单个列，你可以使用它：

>>> df
     A    B    C
0  0.0  2.0  NaN
1  NaN  NaN  5.0
2  NaN  7.0  NaN
>>> df['C'].fillna({i: [] for i in df.index})
0    []
1     5
2    []
Name: C, dtype: object

The solution can be extended to DataFrames by applying it to every column.该解决方案可以通过将其应用于每一列来扩展到 DataFrames。

>>> df.apply(lambda s: s.fillna({i: [] for i in df.index}))
    A   B   C
0   0   2  []
1  []  []   5
2  []   7  []

Note: for large Series/DataFrames with few missing values, this might create an unreasonable amount of throwaway empty lists.注意：对于缺失值很少的大型系列/数据帧，这可能会创建大量的一次性空列表。

Tested with pandas 1.0.5.使用pandas 1.0.5 测试。

Answer 6

Another solution using numpy:使用 numpy 的另一种解决方案：

df.ids = np.where(df.ids.isnull(), pd.Series([[]]*len(df)), df.ids)

Or using combine_first:或使用 combine_first：

df.ids = df.ids.combine_first(pd.Series([[]]*len(df)))

Answer 7

Maybe not the most short/optimized solution, but I think is pretty readable:也许不是最简短/优化的解决方案，但我认为它非常易读：

# Packages
import ast

# Masking-in nans
mask = df['ids'].isna()

# Filling nans with a list-like string and literally-evaluating such string
df.loc[mask, 'ids'] = df.loc[mask, 'ids'].fillna('[]').apply(ast.literal_eval)

The drawback is that you need to load the ast package.缺点是需要加载ast package。

EDIT编辑

I recently figured out the existence of the eval() built-in.我最近发现了eval()内置的存在。 This avoids importing any extra package.这样可以避免导入任何额外的 package。

# Masking-in nans
mask = df['ids'].isna()

# Filling nans with a list-like string and literally-evaluating such string
df.loc[mask, 'ids'] = df.loc[mask, 'ids'].fillna('[]').apply(eval)

Answer 8

Without assignments:无作业：

1) Assuming we have only floats and integers in our dataframe 1）假设我们的 dataframe 中只有浮点数和整数

import math
df.apply(lambda x:x.apply(lambda x:[] if math.isnan(x) else x))

2) For any dataframe 2) 对于任何 dataframe

import math
def isnan(x):
    if isinstance(x, (int, long, float, complex)) and math.isnan(x):
        return True

df.apply(lambda x:x.apply(lambda x:[] if isnan(x) else x))

Answer 9

Maybe more dense:也许更密集：

df['ids'] = [[] if type(x) != list else x for x in df['ids']]

Answer 10

This is probably faster, one liner solution:这可能更快，一个班轮解决方案：

df['ids'].fillna('DELETE').apply(lambda x : [] if x=='DELETE' else x)

Answer 11

Another solution that is explicit:另一个明确的解决方案：

# use apply to only replace the nulls with the list  
df.loc[df.ids.isnull(), 'ids'] = df.loc[df.ids.isnull(), 'ids'].apply(lambda x: [])

Answer 12

Create a function that checks your condition, if not, it returns an empty list/empty set etc.创建一个 function 检查您的条件，如果没有，它返回一个空列表/空集等。

Then apply that function to the variable, but also assigning the new calculated variable to the old one or to a new variable if you wish.然后将 function 应用于变量，但如果您愿意，也可以将新计算的变量分配给旧变量或新变量。

aa=pd.DataFrame({'d':[1,1,2,3,3,np.NaN],'r':[3,5,5,5,5,'e']})


def check_condition(x):
    if x>0:
        return x
    else:
        return list()

aa['d]=aa.d.apply(lambda x:check_condition(x))

Answer 13

You can try this:你可以试试这个：

df.fillna(df.notna().applymap(lambda x: x or []))

Answer 14

list中不支持fillna方法，但你可以用dict来代替。

df.fillna({})

如何在 pandas 中用空列表 [] 填充 dataframe Nan 值？

问题描述

13 个解决方案

解决方案1
52 2017-05-10 18:02:08

解决方案2
33 2017-03-22 17:36:39

解决方案3
28 2020-10-05 11:36:56

解决方案4
25 已采纳 2015-10-18 17:00:03

解决方案5
9 2020-07-02 05:35:52

解决方案6
3 2019-11-15 02:25:50

解决方案7
2 2020-06-25 10:47:45

解决方案8
1 2015-10-18 19:28:45

解决方案9
1 2019-06-03 08:34:47

解决方案10
1 2020-04-01 15:56:21

解决方案11
1 2021-08-17 22:02:06

解决方案12
0 2017-12-04 17:55:07

解决方案13
0 2022-05-04 12:42:48

解决方案14
-6 2017-11-02 08:18:41

如何在 pandas 中用空列表 [] 填充 dataframe Nan 值？

问题描述

13 个解决方案

解决方案1 52 2017-05-10 18:02:08

解决方案2 33 2017-03-22 17:36:39

解决方案3 28 2020-10-05 11:36:56

解决方案4 25 已采纳 2015-10-18 17:00:03

解决方案5 9 2020-07-02 05:35:52

解决方案6 3 2019-11-15 02:25:50

解决方案7 2 2020-06-25 10:47:45

解决方案8 1 2015-10-18 19:28:45

解决方案9 1 2019-06-03 08:34:47

解决方案10 1 2020-04-01 15:56:21

解决方案11 1 2021-08-17 22:02:06

解决方案12 0 2017-12-04 17:55:07

解决方案13 0 2022-05-04 12:42:48

解决方案14 -6 2017-11-02 08:18:41

解决方案1
52 2017-05-10 18:02:08

解决方案2
33 2017-03-22 17:36:39

解决方案3
28 2020-10-05 11:36:56

解决方案4
25 已采纳 2015-10-18 17:00:03

解决方案5
9 2020-07-02 05:35:52

解决方案6
3 2019-11-15 02:25:50

解决方案7
2 2020-06-25 10:47:45

解决方案8
1 2015-10-18 19:28:45

解决方案9
1 2019-06-03 08:34:47

解决方案10
1 2020-04-01 15:56:21

解决方案11
1 2021-08-17 22:02:06

解决方案12
0 2017-12-04 17:55:07

解决方案13
0 2022-05-04 12:42:48

解决方案14
-6 2017-11-02 08:18:41