简体   繁体   English

如何使用for循环在pandas数据框中的现有列上创建条件列

[英]How to create new column conditional on existing columns in pandas dataframe using for loop

I have a dataset of two columns and I want to create a third column that says whether the values of the first two columns are identical, and names the identical value for each row. 我有一个包含两列的数据集,我想创建第三列,该列说明前两列的值是否相同,并为每行命名相同的值。

Example data: 示例数据:

import pandas as pd

data = {'Colour_mix': ['1','2', '3', '4', '5', '6', '7', '8', '9', '10'], 
        'Colour_1': ['red', 'blue', 'red', 'red', 'green', 'green', 'green', 'red', 'blue', 'blue'],
        'Colour_2': ['red', 'green', 'red', 'blue', 'green', 'red', 'green', 'red', 'green', 'blue'] }
df1 = pd.DataFrame(data)
cols = ['Colour_mix', 'Colour_1', 'Colour_2']
df1 = df1[cols] 
df1

What I want to end up with looks like this: 我想要的最终结果如下所示:

data2 = {'Colour_mix': ['1','2', '3', '4', '5', '6', '7', '8', '9', '10'], 
        'Colour_1': ['red', 'blue', 'red', 'red', 'green', 'green', 'green', 'red', 'blue', 'blue'],
        'Colour_2': ['red', 'green', 'red', 'blue', 'green', 'red', 'green', 'red', 'green', 'blue'],
        'Pairwise_match': ['red', 'False', 'red', 'False', 'green', 'False', 'green', 'red', 'False', 'blue']}
df2 = pd.DataFrame(data2)
cols2 = ['Colour_mix', 'Colour_1', 'Colour_2', 'Pairwise_match']
df2 = df2[cols2] 
df2 

ie a new column is added which states firstly when the Colour_1 and Colour_2 columns match, and secondly what the shared value is (red, blue or green). 即添加一个新列,该列首先说明Colour_1和Colour_2列何时匹配,其次说明共享值(红色,蓝色或绿色)。

My approach so far was to create an ordered dict of boolean arrays for when the Colour_1 and Colour_2 columns matched, and I was hoping to then create a loop that iteratively: 1. Changed the "True" of the boolean array to the value of the match, ie red, blue or green, and 2. Merged the resulting matches into a single column. 到目前为止,我的方法是为Colour_1和Colour_2列匹配时创建布尔数组的有序字典,我希望然后创建一个迭代的循环:1.将boolean数组的“ True”更改为匹配,即红色,蓝色或绿色,以及2.将结果匹配项合并到单个列中。

My code so far: 到目前为止,我的代码:

# Create a list of boolean arrays for each match pair
colour_matches = collections.OrderedDict()

colour_matches['red'] = ( (df1['Colour_1']=='red')
                      & (df1['Colour_2']=='red')
                      )

colour_matches['blue'] = ( (df1['Colour_1']=='blue')
                      & (df1['Colour_2']=='blue')
                      )

colour_matches['green'] = ( (df1['Colour_1']=='green')
                      & (df1['Colour_2']=='green')
                      )

# Add pairwise match columns

for p in colour_matches:
    print(p)
    _matches_df = pd.DataFrame(colour_matches[p])
    _matches_df.columns = ['Pairwise_match']
    df_new = pd.concat([df1, _matches_df], axis=1)

Two problems I'm having: 1. I can't figure out how to change the value of the boolean arrays within the loop so "True" is replaced conditionally with the shared value of the two colour columns (red, blue or green). 我遇到的两个问题:1.我无法弄清楚如何在循环中更改布尔数组的值,因此“ True”被有条件地替换为两个颜色列(红色,蓝色或绿色)的共享值。 2. My loop currently overwrites the Pairwise_match in each loop so the information on matching rows for the previous colour matches (red and blue) is lost and it only shows green. 2.我的循环当前覆盖每个循环中的Pairwise_match,因此先前颜色匹配(红色和蓝色)的匹配行上的信息会丢失,并且仅显示绿色。 I was hoping to end up with three columns of pairwise matches (ie to add/ append columns each run of the loop) which I could then merge into my single desired column. 我希望以三列成对匹配(即每次循环运行添加/追加列)结束,然后将它们合并到我想要的单个列中。 Many thanks. 非常感谢。

Use numpy.where with boolean mask compared both columns: numpy.where与boolean mask比较两列:

df1['Pairwise_match'] = np.where(df1['Colour_1'] == df1['Colour_2'], df1['Colour_1'], False)
print (df1)
  Colour_mix Colour_1 Colour_2 Pairwise_match
0          1      red      red            red
1          2     blue    green          False
2          3      red      red            red
3          4      red     blue          False
4          5    green    green          green
5          6    green      red          False
6          7    green    green          green
7          8      red      red            red
8          9     blue    green          False
9         10     blue     blue           blue

Detail: 详情:

print (df1['Colour_1'] == df1['Colour_2'])
0     True
1    False
2     True
3    False
4     True
5    False
6     True
7     True
8    False
9     True
dtype: bool

A simpler approach might be: 一个更简单的方法可能是:

df1["Pairwise_match"] = False
df1.loc[df1.Colour_1 == df1.Colour_2, "Pairwise_match"] = df1.Colour_1[df1.Colour_1 == df1.Colour_2]

This will create a column full of False and then where the colours match between the columns, replace them with the value of colour 这将创建一个充满False的列,然后在各列之间颜色匹配的地方,将它们替换为color的值。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何创建新列作为 pandas dataframe 中现有列的 function? - How to create new columns as a function of existing columns in a pandas dataframe? 如何在另一个数据帧 python pandas 中的多列上使用条件逻辑在数据帧中创建一列? - How can I create a column in a dataframe using conditional logic on multiple columns in another dataframe python pandas? 使用 pandas/python 从 DataFrame 中的两个现有文本列创建一个新列 - Create a new column from two existing text columns in a DataFrame using pandas/python Pandas:如何在 Dataframe 中创建一个新列并考虑其他现有列在其中添加值 - Pandas: How to create a new column in a Dataframe and add values in it considering other existing columns 如果列表中的元素存在于另一列中,如何在 Pandas dataframe 中创建带有标志的新列? - How to create new columns in Pandas dataframe with flags if the element in a list existing in another column? Pandas:如何使用现有字符串数据列在 dataframe 中创建两个新列 - Pandas: how to create two new columns in a dataframe with an existing string data column 如何从 pandas dataframe 中的现有列创建新列 - How to create a new column from an existing column in a pandas dataframe 在现有列上创建带有条件操作的新列 - Create new column with conditional operations on existing columns 如何在groupby pandas dataFrame中创建具有条件计数的新列 - How to create a new column with a conditional count in a groupby pandas dataFrame 为pandas dataframe创建新列的条件要求 - conditional requirement to create new column for pandas dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM