How can we replace the columns values of one dataframe based on different dataframe column using some conditions?

Question

I have two dataframe, say df1 and df2, both of these dataframe are very large, having 1 million+ rows and 1000 columns. Now, df1 has a column, say X which has the characters in it (as shown below). And df2 has 900+ columns and each of which needs to be changed based on df1.

df1:
Index   ColX ColY
 100     C    R
 101     T    Z
 102     A    Y
 ...    ..   ..

df2:
Index    ColA   ColB   ColC   ColD   ...  ...
 100     0.033  0.10   0.22   1.22   ...  ...
 101     1.77   1.34   0.45   1.90   ...  ...
 102     0.88   1.56   1.99   0.99   ...  ...
 ...     ...    ...    ...    ...    ...  ...

Condition to be applied is that:

If columns in df2 >= 0 and < 1.5, then replace those values with Col X values corresponding to that index.

Elif columns in df2 >= 1.5 and <= 2 then replace those values with Col Y values corresponding to that index

Expected Output:

df2:
Index    ColA   ColB   ColC   ColD   ...  ...
 100      C      C       C      C    ...  ...
 101      Z      T       T      Z    ...  ...
 102      A      Y       Y      A    ...  ...
 ...     ...    ...    ...    ...    ...  ...

I tried this way:

for v in df2.columns.tolist():
    df2 = df2.loc[(df2[v] >= 0) & (df2[v] < 1.5) , v] = df1['ColX']

Sometimes this is working, sometimes it is not (for the first case) but this method is very slow. I have a very big file.

Please someone can tell me any efficient way to do this. Thankx in Advance.

Answer 1

Maybe it is to slow but this yields the desired result:

for v in df2.columns:
    ok = (df2[v] >= 0) & (df2[v] < 1.5)
    df2.loc[ok, v] = df1.loc[ok, 'ColX']
    df2.loc[~ok, v] = df1.loc[~ok, 'ColY']

Answer 2

If there is same index in both DataFrames use numpy.select with repeating values by broadcasting:

arr = df2.values
m1 = (arr >= 0) & (arr < 1.5)
m2 = (arr >= 1.2) & (arr <= 2)

a1 = df1['ColX'].values[:, None]
a2 = df1['ColY'].values[:, None]

df = pd.DataFrame(np.select([m1, m2], [a1, a2]), index=df2.index, columns=df2.columns)
print (df)
    ColA ColB ColC ColD
100    C    C    C    C
101    Z    T    T    Z
102    A    Y    Y    A

How can we replace the columns values of one dataframe based on different dataframe column using some conditions?

Question

2 answers

solution1
0 2019-02-12 08:08:47

solution2
0 ACCPTED 2019-02-12 08:16:01

How can we replace the columns values of one dataframe based on different dataframe column using some conditions?

Question

2 answers

solution1 0 2019-02-12 08:08:47

solution2 0 ACCPTED 2019-02-12 08:16:01

solution1
0 2019-02-12 08:08:47

solution2
0 ACCPTED 2019-02-12 08:16:01