How to create new column conditional on existing columns in pandas dataframe using for loop

Question

I have a dataset of two columns and I want to create a third column that says whether the values of the first two columns are identical, and names the identical value for each row.

Example data:

import pandas as pd

data = {'Colour_mix': ['1','2', '3', '4', '5', '6', '7', '8', '9', '10'], 
        'Colour_1': ['red', 'blue', 'red', 'red', 'green', 'green', 'green', 'red', 'blue', 'blue'],
        'Colour_2': ['red', 'green', 'red', 'blue', 'green', 'red', 'green', 'red', 'green', 'blue'] }
df1 = pd.DataFrame(data)
cols = ['Colour_mix', 'Colour_1', 'Colour_2']
df1 = df1[cols] 
df1

What I want to end up with looks like this:

data2 = {'Colour_mix': ['1','2', '3', '4', '5', '6', '7', '8', '9', '10'], 
        'Colour_1': ['red', 'blue', 'red', 'red', 'green', 'green', 'green', 'red', 'blue', 'blue'],
        'Colour_2': ['red', 'green', 'red', 'blue', 'green', 'red', 'green', 'red', 'green', 'blue'],
        'Pairwise_match': ['red', 'False', 'red', 'False', 'green', 'False', 'green', 'red', 'False', 'blue']}
df2 = pd.DataFrame(data2)
cols2 = ['Colour_mix', 'Colour_1', 'Colour_2', 'Pairwise_match']
df2 = df2[cols2] 
df2

ie a new column is added which states firstly when the Colour_1 and Colour_2 columns match, and secondly what the shared value is (red, blue or green).

My approach so far was to create an ordered dict of boolean arrays for when the Colour_1 and Colour_2 columns matched, and I was hoping to then create a loop that iteratively: 1. Changed the "True" of the boolean array to the value of the match, ie red, blue or green, and 2. Merged the resulting matches into a single column.

My code so far:

# Create a list of boolean arrays for each match pair
colour_matches = collections.OrderedDict()

colour_matches['red'] = ( (df1['Colour_1']=='red')
                      & (df1['Colour_2']=='red')
                      )

colour_matches['blue'] = ( (df1['Colour_1']=='blue')
                      & (df1['Colour_2']=='blue')
                      )

colour_matches['green'] = ( (df1['Colour_1']=='green')
                      & (df1['Colour_2']=='green')
                      )

# Add pairwise match columns

for p in colour_matches:
    print(p)
    _matches_df = pd.DataFrame(colour_matches[p])
    _matches_df.columns = ['Pairwise_match']
    df_new = pd.concat([df1, _matches_df], axis=1)

Two problems I'm having: 1. I can't figure out how to change the value of the boolean arrays within the loop so "True" is replaced conditionally with the shared value of the two colour columns (red, blue or green). 2. My loop currently overwrites the Pairwise_match in each loop so the information on matching rows for the previous colour matches (red and blue) is lost and it only shows green. I was hoping to end up with three columns of pairwise matches (ie to add/ append columns each run of the loop) which I could then merge into my single desired column. Many thanks.

Answer 1

Use numpy.where with boolean mask compared both columns:

df1['Pairwise_match'] = np.where(df1['Colour_1'] == df1['Colour_2'], df1['Colour_1'], False)
print (df1)
  Colour_mix Colour_1 Colour_2 Pairwise_match
0          1      red      red            red
1          2     blue    green          False
2          3      red      red            red
3          4      red     blue          False
4          5    green    green          green
5          6    green      red          False
6          7    green    green          green
7          8      red      red            red
8          9     blue    green          False
9         10     blue     blue           blue

Detail:

print (df1['Colour_1'] == df1['Colour_2'])
0     True
1    False
2     True
3    False
4     True
5    False
6     True
7     True
8    False
9     True
dtype: bool

Answer 2

A simpler approach might be:

df1["Pairwise_match"] = False
df1.loc[df1.Colour_1 == df1.Colour_2, "Pairwise_match"] = df1.Colour_1[df1.Colour_1 == df1.Colour_2]

This will create a column full of False and then where the colours match between the columns, replace them with the value of colour

How to create new column conditional on existing columns in pandas dataframe using for loop

Question

2 answers

solution1
3 ACCPTED 2018-10-04 06:14:07

solution2
2 2018-10-04 06:16:04

How to create new column conditional on existing columns in pandas dataframe using for loop

Question

2 answers

solution1 3 ACCPTED 2018-10-04 06:14:07

solution2 2 2018-10-04 06:16:04

solution1
3 ACCPTED 2018-10-04 06:14:07

solution2
2 2018-10-04 06:16:04