简体   繁体   English

Pandas - 用空的 python dict 对象替换 DataFrame 中的所有 NaN 值

[英]Pandas - replace all NaN values in DataFrame with empty python dict objects

I have a pandas DataFrame where each cell contains a python dict.我有一个pandas DataFrame,其中每个单元格都包含一个python dict。

>>> data = {'Q':{'X':{2:2010}, 'Y':{2:2011, 3:2009}},'R':{'X':{1:2013}}}
>>> frame = DataFrame(data)
>>> frame
                    Q          R
X           {2: 2010}  {1: 2013}
Y  {2: 2011, 3: 2009}        NaN

I'd like to replace the NaN with an empty dict, to get this result:我想用一个空的 dict 替换 NaN,以获得这个结果:

                    Q          R
X           {2: 2010}  {1: 2013}
Y  {2: 2011, 3: 2009}        {}

However, because the fillna function interprets empty dict not as a scalar value but as a mapping of column --> value, it does NOTHING if I simply do this (ie it doesn't work):但是,因为fillna函数不是将空字典解释为标量值,而是作为列 --> 值的映射,所以如果我只是这样做(即它不起作用),它什么也不做:

>>> frame.fillna(inplace=True, value={})
                    Q          R
X           {2: 2010}  {1: 2013}
Y  {2: 2011, 3: 2009}        NaN

Is there any way to use fillna to accomplish what I want?有什么办法可以使用fillna来完成我想要的吗? Do I have to iterate through the entire DataFrame or construct a silly dict with all my columns mapped to empty dict?我是否必须遍历整个 DataFrame 或构建一个愚蠢的字典,并将所有列映射到空字典?

I was able to use DataFrame.applymap in this way:我能够以这种方式使用DataFrame.applymap

>>> from pandas import isnull
>>> frame=frame.applymap(lambda x: {} if isnull(x) else x)
>>> frame
                    Q          R
X           {2: 2010}  {1: 2013}
Y  {2: 2011, 3: 2009}         {}

This solution avoids the pitfalls in both EdChum's solution (where all NaN cells wind up pointing at same underlying dict object in memory, preventing them from being updated independently from one another) and Shashank's (where a potentially large data structure needs to be constructed with nested dicts, just to specify a single empty dict value).此解决方案避免了 EdChum 解决方案(其中所有 NaN 单元最终指向内存中相同的底层 dict 对象,防止它们彼此独立更新)和 Shashank 解决方案(其中一个潜在的大型数据结构需要使用嵌套结构)中的陷阱dicts,只是为了指定一个空的 dict 值)。

DataFrame.where is a way of achieving this quite directly: DataFrame.where是一种非常直接地实现这一目标的方法:

>>> data = {'Q': {'X': {2: 2010}, 'Y': {2: 2011, 3: 2009}}, 'R': {'X': {1: 2013}}}
>>> frame = DataFrame(data)
>>> frame
                    Q          R
X           {2: 2010}  {1: 2013}
Y  {2: 2011, 3: 2009}        NaN

>>> frame.where(frame.notna(), lambda x: [{}])
                    Q          R
X           {2: 2010}  {1: 2013}
Y  {2: 2011, 3: 2009}         {}

Also, it appears to be a bit faster:此外,它似乎有点快:

>>> %timeit frame.where(frame.notna(), lambda x: [{}])
791 µs ± 16.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>>> %timeit frame.applymap(lambda x: {} if isnull(x) else x)
1.07 ms ± 7.15 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

(on larger datasets I've observed speedups of ~10x) (在较大的数据集上,我观察到了 ~10 倍的加速)

The problem is that when a dict is passed to fillna , it tries to fill the values based on the columns in the frame.问题在于,当 dict 传递给fillna ,它会尝试根据框架中的列填充值。 So the first solution I tried was -所以我尝试的第一个解决方案是 -

frame.fillna({column: {} for column in frame.columns})

But if a dictionary is provided at the second level like this, it tries to match the keys against the index, so the solution that worked was -但是,如果像这样在第二级提供字典,它会尝试将键与索引进行匹配,因此有效的解决方案是 -

frame.fillna({column: {ind: {} for ind in frame.index} for column in frame.columns})

Which gives -这使 -

                    Q          R
X           {2: 2010}  {1: 2013}
Y  {2: 2011, 3: 2009}         {}

EdChum's answer is probably better for your needs, but this can be used when you don't want to make changes in place. EdChum 的答案可能更适合您的需求,但是当您不想进行适当的更改时可以使用它。

EDIT: The solution above works well for smaller frames, but can be a problem for larger frames.编辑:上述解决方案适用于较小的框架,但对于较大的框架可能是一个问题。 Using replace can solve that.使用replace可以解决这个问题。

frame.replace(np.nan, {column: {} for column in frame.columns})

This works using loc :这适用于loc

In [6]:

frame.loc[frame['R'].isnull(), 'R'] = {}
frame
Out[6]:
                    Q          R
X           {2: 2010}  {1: 2013}
Y  {2: 2011, 3: 2009}         {}

Use .values accessor to assign into numpy array directly:使用.values访问器直接分配给 numpy 数组:

frame.R = frame.R.astype(object)  # assertion

frame.R.values[frame.R.isnull()] = {}

@Josh_Bode's answer helped me a lot. @Josh_Bode 的回答对我帮助很大。 Here's a very slightly different version.这是一个略有不同的版本。 I used mask() instead of where() (pretty trivial change).我使用了 mask() 而不是 where() (非常微不足道的变化)。 I also updated the way to assign an empty dictionary.我还更新了分配空字典的方式。 By creating a list of dict instances as long as the frame and then assigning that, I avoided the trap of many copies of the same dict.通过创建一个与框架一样长的 dict 实例列表然后分配它,我避免了同一 dict 的许多副本的陷阱。

>>> data = {'Q': {'X': {2: 2010}, 'Y': {2: 2011, 3: 2009}}, 'R': {'X': {1: 2013}}}
>>> frame = DataFrame(data)
>>> frame
                    Q          R
X           {2: 2010}  {1: 2013}
Y  {2: 2011, 3: 2009}        NaN

>>> frame.mask(frame.isna(), lambda x: [{} for _ in range(len(frame)])
                    Q          R
X           {2: 2010}  {1: 2013}
Y  {2: 2011, 3: 2009}         {}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM