简体   繁体   English

combine_first 和 fillna 有什么区别?

[英]What is the difference between combine_first and fillna?

These two functions seem equivalent to me.这两个功能对我来说似乎是等价的。 You can see that they accomplish the same goal in the code below, as columns c and d are equal.您可以在下面的代码中看到它们实现了相同的目标,因为 c 列和 d 列相等。 So when should I use one over the other?那么我什么时候应该使用一个?

Here is an example:下面是一个例子:

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randint(0, 10, size=(10, 2)), columns=list('ab'))
df.loc[::2, 'a'] = np.nan

Returns:返回:

     a  b
0  NaN  4
1  2.0  6
2  NaN  8
3  0.0  4
4  NaN  4
5  0.0  8
6  NaN  7
7  2.0  2
8  NaN  9
9  7.0  2

This is my starting point.这是我的出发点。 Now I will add two columns, one using combine_first and one using fillna, and they will produce the same result:现在我将添加两列,一列使用 combine_first,另一列使用 fillna,它们将产生相同的结果:

df['c'] = df.a.combine_first(df.b)
df['d'] = df['a'].fillna(df['b'])

Returns:返回:

     a  b    c    d
0  NaN  4  4.0  4.0
1  8.0  7  8.0  8.0
2  NaN  2  2.0  2.0
3  3.0  0  3.0  3.0
4  NaN  0  0.0  0.0
5  2.0  4  2.0  2.0
6  NaN  0  0.0  0.0
7  2.0  6  2.0  2.0
8  NaN  4  4.0  4.0
9  4.0  6  4.0  4.0

Credit to this question for the data set: Combine Pandas data frame column values into new column归功于数据集的这个问题: Combining Pandas data frame column values into new column

combine_first is intended to be used when there is exists non-overlapping indices. combine_first旨在在存在非重叠索引时使用。 It will effectively fill in nulls as well as supply values for indices and columns that didn't exist in the first.它将有效地填充空值以及为第一个中不存在的索引和列提供值。

dfa = pd.DataFrame([[1, 2, 3], [4, np.nan, 5]], ['a', 'b'], ['w', 'x', 'y'])

     w    x    y  
a  1.0  2.0  3.0  
b  4.0  NaN  5.0  

dfb = pd.DataFrame([[1, 2, 3], [3, 4, 5]], ['b', 'c'], ['x', 'y', 'z'])

     x    y    z
b  1.0  2.0  3.0
c  3.0  4.0  5.0

dfa.combine_first(dfb)

     w    x    y    z
a  1.0  2.0  3.0  NaN
b  4.0  1.0  5.0  3.0  # 1.0 filled from `dfb`; 5.0 was in `dfa`; 3.0 new column
c  NaN  3.0  4.0  5.0  # whole new index

Notice that all indices and columns are included in the results请注意,所有索引和列都包含在结果中

Now if we fillna现在如果我们fillna

dfa.fillna(dfb)

   w    x  y
a  1  2.0  3
b  4  1.0  5  # 1.0 filled in from `dfb`

Notice no new columns or indices from dfb are included.请注意,未包含来自dfb新列或索引。 We only filled in the null value where dfa shared index and column information.我们只填写了dfa共享索引和列信息的空值。


In your case, you use fillna and combine_first on one column with the same index.在您的情况下,您在具有相同索引的一列上使用fillnacombine_first These translate to effectively the same thing.这些转化为有效的同一件事。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM