简体   繁体   中英

Replace zeros in one dataframe with values from another dataframe

I have two dataframes df1 and df2: df1 is shown here:

   age
0   42
1   52
2   36
3   24
4   73

df2 is shown here:

   age
0    0
1    0
2    1
3    0
4    0

I want to replace all the zeros in df2 with their corresponding entries in df1. In more technical words, if the element at a certain index in df2 is zero, then I would want this element to be replaced by the corresponding entry in df1.

Hence, I want df2 to look like:

   age
0    42
1    52
2    1
3    24
4    73

I tried using the replace method but it is not working. Please help :) Thanks in advance.

You could use where :

In [19]: df2.where(df2 != 0, df1)
Out[19]: 
   age
0   42
1   52
2    1
3   24
4   73

Above, df2 != 0 is a boolean DataFrame.

In [16]: df2 != 0
Out[16]: 
     age
0  False
1  False
2   True
3  False
4  False

df2.where(df2 != 0, df1) returns a new DataFrame. Where df2 != 0 is True, the corresponding value of df2 is used. Where it is False, the corresponding value of df1 is used.


Another alternative is to make an assignment with df.loc :

df2.loc[df2['age'] == 0, 'age'] = df1['age']

df.loc[mask, col] selects rows of df where the boolean Series, mask is True, and where the column label is col .

In [17]: df2.loc[df2['age'] == 0, 'age']
Out[17]: 
0    0
1    0
3    0
4    0
Name: age, dtype: int64

When used in an assignment, such as df2.loc[df2['age'] == 0, 'age'] = df1['age'] , Pandas performs automatic index label alignment. (Notice the index labels above are 0,1,3,4 -- with 2 being skipped). So the values in df2.loc[df2['age'] == 0, 'age'] are replaced by the corresponding values from d1['age'] . Even though d1['age'] is a Series with index labels 0 , 1 , 2 , 3 , and 4 , the 2 is ignored because there is no corresponding index label on the left-hand side.

In other words,

df2.loc[df2['age'] == 0, 'age'] = df1.loc[df2['age'] == 0, 'age']

would work as well, but the added restriction on the right-hand side is unnecessary.

In [30]: df2.mask(df2==0).combine_first(df1)
Out[30]:
    age
0  42.0
1  52.0
2   1.0
3  24.0
4  73.0

or "negating" beautiful @unutbu's solution :

In [46]: df2.mask(df2==0, df1)
Out[46]:
   age
0   42
1   52
2    1
3   24
4   73

或者尝试mul

df1.mul(np.where(df2==1,0,1)).replace({0:1})

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM