I have a dataset of two columns and I want to create a third column that says whether the values of the first two columns are identical, and names the identical value for each row.
Example data:
import pandas as pd
data = {'Colour_mix': ['1','2', '3', '4', '5', '6', '7', '8', '9', '10'],
'Colour_1': ['red', 'blue', 'red', 'red', 'green', 'green', 'green', 'red', 'blue', 'blue'],
'Colour_2': ['red', 'green', 'red', 'blue', 'green', 'red', 'green', 'red', 'green', 'blue'] }
df1 = pd.DataFrame(data)
cols = ['Colour_mix', 'Colour_1', 'Colour_2']
df1 = df1[cols]
df1
What I want to end up with looks like this:
data2 = {'Colour_mix': ['1','2', '3', '4', '5', '6', '7', '8', '9', '10'],
'Colour_1': ['red', 'blue', 'red', 'red', 'green', 'green', 'green', 'red', 'blue', 'blue'],
'Colour_2': ['red', 'green', 'red', 'blue', 'green', 'red', 'green', 'red', 'green', 'blue'],
'Pairwise_match': ['red', 'False', 'red', 'False', 'green', 'False', 'green', 'red', 'False', 'blue']}
df2 = pd.DataFrame(data2)
cols2 = ['Colour_mix', 'Colour_1', 'Colour_2', 'Pairwise_match']
df2 = df2[cols2]
df2
ie a new column is added which states firstly when the Colour_1 and Colour_2 columns match, and secondly what the shared value is (red, blue or green).
My approach so far was to create an ordered dict of boolean arrays for when the Colour_1 and Colour_2 columns matched, and I was hoping to then create a loop that iteratively: 1. Changed the "True" of the boolean array to the value of the match, ie red, blue or green, and 2. Merged the resulting matches into a single column.
My code so far:
# Create a list of boolean arrays for each match pair
colour_matches = collections.OrderedDict()
colour_matches['red'] = ( (df1['Colour_1']=='red')
& (df1['Colour_2']=='red')
)
colour_matches['blue'] = ( (df1['Colour_1']=='blue')
& (df1['Colour_2']=='blue')
)
colour_matches['green'] = ( (df1['Colour_1']=='green')
& (df1['Colour_2']=='green')
)
# Add pairwise match columns
for p in colour_matches:
print(p)
_matches_df = pd.DataFrame(colour_matches[p])
_matches_df.columns = ['Pairwise_match']
df_new = pd.concat([df1, _matches_df], axis=1)
Two problems I'm having: 1. I can't figure out how to change the value of the boolean arrays within the loop so "True" is replaced conditionally with the shared value of the two colour columns (red, blue or green). 2. My loop currently overwrites the Pairwise_match in each loop so the information on matching rows for the previous colour matches (red and blue) is lost and it only shows green. I was hoping to end up with three columns of pairwise matches (ie to add/ append columns each run of the loop) which I could then merge into my single desired column. Many thanks.
Use numpy.where
with boolean mask compared both columns:
df1['Pairwise_match'] = np.where(df1['Colour_1'] == df1['Colour_2'], df1['Colour_1'], False)
print (df1)
Colour_mix Colour_1 Colour_2 Pairwise_match
0 1 red red red
1 2 blue green False
2 3 red red red
3 4 red blue False
4 5 green green green
5 6 green red False
6 7 green green green
7 8 red red red
8 9 blue green False
9 10 blue blue blue
Detail:
print (df1['Colour_1'] == df1['Colour_2'])
0 True
1 False
2 True
3 False
4 True
5 False
6 True
7 True
8 False
9 True
dtype: bool
A simpler approach might be:
df1["Pairwise_match"] = False
df1.loc[df1.Colour_1 == df1.Colour_2, "Pairwise_match"] = df1.Colour_1[df1.Colour_1 == df1.Colour_2]
This will create a column full of False and then where the colours match between the columns, replace them with the value of colour
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.