[英]How to create new column conditional on existing columns in pandas dataframe using for loop
I have a dataset of two columns and I want to create a third column that says whether the values of the first two columns are identical, and names the identical value for each row. 我有一个包含两列的数据集,我想创建第三列,该列说明前两列的值是否相同,并为每行命名相同的值。
Example data: 示例数据:
import pandas as pd
data = {'Colour_mix': ['1','2', '3', '4', '5', '6', '7', '8', '9', '10'],
'Colour_1': ['red', 'blue', 'red', 'red', 'green', 'green', 'green', 'red', 'blue', 'blue'],
'Colour_2': ['red', 'green', 'red', 'blue', 'green', 'red', 'green', 'red', 'green', 'blue'] }
df1 = pd.DataFrame(data)
cols = ['Colour_mix', 'Colour_1', 'Colour_2']
df1 = df1[cols]
df1
What I want to end up with looks like this: 我想要的最终结果如下所示:
data2 = {'Colour_mix': ['1','2', '3', '4', '5', '6', '7', '8', '9', '10'],
'Colour_1': ['red', 'blue', 'red', 'red', 'green', 'green', 'green', 'red', 'blue', 'blue'],
'Colour_2': ['red', 'green', 'red', 'blue', 'green', 'red', 'green', 'red', 'green', 'blue'],
'Pairwise_match': ['red', 'False', 'red', 'False', 'green', 'False', 'green', 'red', 'False', 'blue']}
df2 = pd.DataFrame(data2)
cols2 = ['Colour_mix', 'Colour_1', 'Colour_2', 'Pairwise_match']
df2 = df2[cols2]
df2
ie a new column is added which states firstly when the Colour_1 and Colour_2 columns match, and secondly what the shared value is (red, blue or green). 即添加一个新列,该列首先说明Colour_1和Colour_2列何时匹配,其次说明共享值(红色,蓝色或绿色)。
My approach so far was to create an ordered dict of boolean arrays for when the Colour_1 and Colour_2 columns matched, and I was hoping to then create a loop that iteratively: 1. Changed the "True" of the boolean array to the value of the match, ie red, blue or green, and 2. Merged the resulting matches into a single column. 到目前为止,我的方法是为Colour_1和Colour_2列匹配时创建布尔数组的有序字典,我希望然后创建一个迭代的循环:1.将boolean数组的“ True”更改为匹配,即红色,蓝色或绿色,以及2.将结果匹配项合并到单个列中。
My code so far: 到目前为止,我的代码:
# Create a list of boolean arrays for each match pair
colour_matches = collections.OrderedDict()
colour_matches['red'] = ( (df1['Colour_1']=='red')
& (df1['Colour_2']=='red')
)
colour_matches['blue'] = ( (df1['Colour_1']=='blue')
& (df1['Colour_2']=='blue')
)
colour_matches['green'] = ( (df1['Colour_1']=='green')
& (df1['Colour_2']=='green')
)
# Add pairwise match columns
for p in colour_matches:
print(p)
_matches_df = pd.DataFrame(colour_matches[p])
_matches_df.columns = ['Pairwise_match']
df_new = pd.concat([df1, _matches_df], axis=1)
Two problems I'm having: 1. I can't figure out how to change the value of the boolean arrays within the loop so "True" is replaced conditionally with the shared value of the two colour columns (red, blue or green). 我遇到的两个问题:1.我无法弄清楚如何在循环中更改布尔数组的值,因此“ True”被有条件地替换为两个颜色列(红色,蓝色或绿色)的共享值。 2. My loop currently overwrites the Pairwise_match in each loop so the information on matching rows for the previous colour matches (red and blue) is lost and it only shows green.
2.我的循环当前覆盖每个循环中的Pairwise_match,因此先前颜色匹配(红色和蓝色)的匹配行上的信息会丢失,并且仅显示绿色。 I was hoping to end up with three columns of pairwise matches (ie to add/ append columns each run of the loop) which I could then merge into my single desired column.
我希望以三列成对匹配(即每次循环运行添加/追加列)结束,然后将它们合并到我想要的单个列中。 Many thanks.
非常感谢。
Use numpy.where
with boolean mask compared both columns: 将
numpy.where
与boolean mask比较两列:
df1['Pairwise_match'] = np.where(df1['Colour_1'] == df1['Colour_2'], df1['Colour_1'], False)
print (df1)
Colour_mix Colour_1 Colour_2 Pairwise_match
0 1 red red red
1 2 blue green False
2 3 red red red
3 4 red blue False
4 5 green green green
5 6 green red False
6 7 green green green
7 8 red red red
8 9 blue green False
9 10 blue blue blue
Detail: 详情:
print (df1['Colour_1'] == df1['Colour_2'])
0 True
1 False
2 True
3 False
4 True
5 False
6 True
7 True
8 False
9 True
dtype: bool
A simpler approach might be: 一个更简单的方法可能是:
df1["Pairwise_match"] = False
df1.loc[df1.Colour_1 == df1.Colour_2, "Pairwise_match"] = df1.Colour_1[df1.Colour_1 == df1.Colour_2]
This will create a column full of False and then where the colours match between the columns, replace them with the value of colour 这将创建一个充满False的列,然后在各列之间颜色匹配的地方,将它们替换为color的值。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.