[英]How to fill dataframe Nan values with empty list [] in pandas?
This is my dataframe:这是我的 dataframe:
date ids
0 2011-04-23 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,...
1 2011-04-24 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,...
2 2011-04-25 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,...
3 2011-04-26 Nan
4 2011-04-27 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,...
5 2011-04-28 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,...
I want to replace Nan
with [].我想用 [] 替换Nan
。 How to do that?怎么做? Fillna([]) did not work. Fillna([]) 不起作用。 I even tried replace(np.nan, [])
but it gives error:我什至尝试了replace(np.nan, [])
但它给出了错误:
TypeError('Invalid "to_replace" type: \'float\'',)
My approach is similar to @hellpanderrr's, but instead tests for list-ness rather than using isnan
:我的方法与@hellpanderrr 的方法类似,但是测试列表而不是使用isnan
:
df['ids'] = df['ids'].apply(lambda d: d if isinstance(d, list) else [])
I originally tried using pd.isnull
(or pd.notnull
) but, when given a list, that returns the null-ness of each element.我最初尝试使用pd.isnull
(或pd.notnull
)但是,当给定一个列表时,它返回每个元素的空值。
After a lot of head-scratching I found this method that should be the most efficient (no looping, no apply), just assigning to a slice:经过大量的头疼后,我发现这种方法应该是最有效的(没有循环,没有应用),只需分配给一个切片:
isnull = df.ids.isnull()
df.loc[isnull, 'ids'] = [ [[]] * isnull.sum() ]
The trick was to construct your list of []
of the right size ( isnull.sum()
), and then enclose it in a list: the value you are assigning is a 2D array (1 column, isnull.sum()
rows) containing empty lists as elements.诀窍是构建正确大小的[]
列表( isnull.sum()
),然后将其包含在一个列表中:您分配的值是一个二维数组(1 列, isnull.sum()
行)包含空列表作为元素。
A simple solution would be:一个简单的解决方案是:
df['ids'].fillna("").apply(list)
As noted by @timgeb, this requires df['ids']
to contain lists or nan only.正如@timgeb 所指出的,这要求df['ids']
仅包含列表或 nan。
You can first use loc
to locate all rows that have a nan
in the ids
column, and then loop through these rows using at
to set their values to an empty list:您可以首先使用loc
定位在ids
列中具有nan
的所有行,然后使用at
循环遍历这些行以将它们的值设置为空列表:
for row in df.loc[df.ids.isnull(), 'ids'].index:
df.at[row, 'ids'] = []
>>> df
date ids
0 2011-04-23 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
1 2011-04-24 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
2 2011-04-25 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
3 2011-04-26 []
4 2011-04-27 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
5 2011-04-28 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
Surprisingly, passing a dict with empty lists as values seems to work for Series.fillna
, but not DataFrame.fillna
- so if you want to work on a single column you can use this:令人惊讶的是,将带有空列表的字典作为值传递似乎适用于Series.fillna
,但不适用于DataFrame.fillna
- 所以如果你想处理单个列,你可以使用它:
>>> df
A B C
0 0.0 2.0 NaN
1 NaN NaN 5.0
2 NaN 7.0 NaN
>>> df['C'].fillna({i: [] for i in df.index})
0 []
1 5
2 []
Name: C, dtype: object
The solution can be extended to DataFrames by applying it to every column.该解决方案可以通过将其应用于每一列来扩展到 DataFrames。
>>> df.apply(lambda s: s.fillna({i: [] for i in df.index}))
A B C
0 0 2 []
1 [] [] 5
2 [] 7 []
Note: for large Series/DataFrames with few missing values, this might create an unreasonable amount of throwaway empty lists.注意:对于缺失值很少的大型系列/数据帧,这可能会创建大量的一次性空列表。
Tested with pandas
1.0.5.使用pandas
1.0.5 测试。
Another solution using numpy:使用 numpy 的另一种解决方案:
df.ids = np.where(df.ids.isnull(), pd.Series([[]]*len(df)), df.ids)
Or using combine_first:或使用 combine_first:
df.ids = df.ids.combine_first(pd.Series([[]]*len(df)))
Maybe not the most short/optimized solution, but I think is pretty readable:也许不是最简短/优化的解决方案,但我认为它非常易读:
# Packages
import ast
# Masking-in nans
mask = df['ids'].isna()
# Filling nans with a list-like string and literally-evaluating such string
df.loc[mask, 'ids'] = df.loc[mask, 'ids'].fillna('[]').apply(ast.literal_eval)
The drawback is that you need to load the ast
package.缺点是需要加载ast
package。
EDIT编辑
I recently figured out the existence of the eval()
built-in.我最近发现了eval()
内置的存在。 This avoids importing any extra package.这样可以避免导入任何额外的 package。
# Masking-in nans
mask = df['ids'].isna()
# Filling nans with a list-like string and literally-evaluating such string
df.loc[mask, 'ids'] = df.loc[mask, 'ids'].fillna('[]').apply(eval)
Without assignments:无作业:
1) Assuming we have only floats and integers in our dataframe 1)假设我们的 dataframe 中只有浮点数和整数
import math
df.apply(lambda x:x.apply(lambda x:[] if math.isnan(x) else x))
2) For any dataframe 2) 对于任何 dataframe
import math
def isnan(x):
if isinstance(x, (int, long, float, complex)) and math.isnan(x):
return True
df.apply(lambda x:x.apply(lambda x:[] if isnan(x) else x))
Maybe more dense:也许更密集:
df['ids'] = [[] if type(x) != list else x for x in df['ids']]
This is probably faster, one liner solution:这可能更快,一个班轮解决方案:
df['ids'].fillna('DELETE').apply(lambda x : [] if x=='DELETE' else x)
Another solution that is explicit:另一个明确的解决方案:
# use apply to only replace the nulls with the list
df.loc[df.ids.isnull(), 'ids'] = df.loc[df.ids.isnull(), 'ids'].apply(lambda x: [])
Create a function that checks your condition, if not, it returns an empty list/empty set etc.创建一个 function 检查您的条件,如果没有,它返回一个空列表/空集等。
Then apply that function to the variable, but also assigning the new calculated variable to the old one or to a new variable if you wish.然后将 function 应用于变量,但如果您愿意,也可以将新计算的变量分配给旧变量或新变量。
aa=pd.DataFrame({'d':[1,1,2,3,3,np.NaN],'r':[3,5,5,5,5,'e']})
def check_condition(x):
if x>0:
return x
else:
return list()
aa['d]=aa.d.apply(lambda x:check_condition(x))
You can try this:你可以试试这个:
df.fillna(df.notna().applymap(lambda x: x or []))
list
中不支持fillna
方法,但你可以用dict
来代替。
df.fillna({})
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.