I want to replace values within a range of columns in a dataframe with a corresponding value in another column if the value in the range is greater than zero.
I would think that a simple replace like this would work:
df = df.loc[:,'A':'D'].replace(1, df['column_with_value_I_want'])
But that in fact does nothing as far as I can tell except drop the column_with_value_I_want
, which is totally unintended, and I'm not sure why that happens.
This doesn't seem to work either:
df[df.loc[:,'A':'D']] > 0 = df['column_with_value_I_want']
It returns the error: SyntaxError: can't assign to comparison
.
This seems like it should be straightforward, but I'm at a loss after trying several different things to no avail.
The dataframe I'm working with looks something like this:
df = pd.DataFrame({'A' : [1,0,0,1,0,0],
'B' : [1,0,0,1,0,1],
'C' : [1,0,0,1,0,1],
'D' : [1,0,0,1,0,0],
'column_with_value_I_want' : [22.0,15.0,90.0,10.,None,557.0],})
Not sure how to do it in Pandas per se, but it's not that difficult if you drop down to numpy.
If you're lucky enough so that your entire DataFrame is numerical, you can do so as follows:
import numpy as np
m = df.as_matrix()
>>> pd.DataFrame(
np.where(np.logical_or(np.isnan(m), m > 0), np.tile(m[:, [4]], 5), m),
columns=df.columns)
A B C D column_with_value_I_want
0 22 22 22 22 22
1 0 0 0 0 15
2 0 0 0 0 90
3 10 10 10 10 10
4 0 0 0 0 NaN
5 0 557 557 0 557
as_matrix
converts a DataFrame to a numpy array
. np.where
is numpy
's ternary conditional. np.logical_or
is numpy
's or. np.isnan
is a check if a value is not nan
. np.tile
tiles (in this case) a 2d single column to a matrix. Unfortunately, the above will fail if some of your columns (even those not involved in this operation) are inherently non-numerical. In this case, you can do the following:
for col in ['A', 'B', 'C', 'D']:
df[col] = np.where(df[col] > 0, df[col], df.column_with_value_I_want)
which will work as long as the 5 relevant columns are numerical.
This uses a loop (which is frowned upon in numerical Python), but at least it does so over columns, and not rows. Assuming your data is longer than wider, it should be OK.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.