简体   繁体   中英

How to replace values in a range in a pandas dataframe with another value in the same dataframe based on a condition

I want to replace values within a range of columns in a dataframe with a corresponding value in another column if the value in the range is greater than zero.

I would think that a simple replace like this would work:

df = df.loc[:,'A':'D'].replace(1, df['column_with_value_I_want'])

But that in fact does nothing as far as I can tell except drop the column_with_value_I_want , which is totally unintended, and I'm not sure why that happens.

This doesn't seem to work either:

df[df.loc[:,'A':'D']] > 0 = df['column_with_value_I_want']

It returns the error: SyntaxError: can't assign to comparison .

This seems like it should be straightforward, but I'm at a loss after trying several different things to no avail.

The dataframe I'm working with looks something like this:

df = pd.DataFrame({'A' : [1,0,0,1,0,0],
                   'B' : [1,0,0,1,0,1],
                   'C' : [1,0,0,1,0,1],
                   'D' : [1,0,0,1,0,0],
                   'column_with_value_I_want' : [22.0,15.0,90.0,10.,None,557.0],})

Not sure how to do it in Pandas per se, but it's not that difficult if you drop down to numpy.


If you're lucky enough so that your entire DataFrame is numerical, you can do so as follows:

import numpy as np

m = df.as_matrix()
>>> pd.DataFrame(
    np.where(np.logical_or(np.isnan(m), m > 0), np.tile(m[:, [4]], 5), m), 
    columns=df.columns)
    A   B   C   D   column_with_value_I_want
0   22  22  22  22  22
1   0   0   0   0   15
2   0   0   0   0   90
3   10  10  10  10  10
4   0   0   0   0   NaN
5   0   557     557     0   557

  • as_matrix converts a DataFrame to a numpy array .
  • np.where is numpy 's ternary conditional.
  • np.logical_or is numpy 's or.
  • np.isnan is a check if a value is not nan .
  • np.tile tiles (in this case) a 2d single column to a matrix.

Unfortunately, the above will fail if some of your columns (even those not involved in this operation) are inherently non-numerical. In this case, you can do the following:

for col in ['A', 'B', 'C', 'D']:
    df[col] = np.where(df[col] > 0, df[col], df.column_with_value_I_want)

which will work as long as the 5 relevant columns are numerical.

This uses a loop (which is frowned upon in numerical Python), but at least it does so over columns, and not rows. Assuming your data is longer than wider, it should be OK.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM