简体   繁体   English

在Pandas数据框中查找具有相同列值的行

[英]Finding rows with same column values in pandas dataframe

I have two dataframes with different column size, where four columns can have the same values in both dataframes. 我有两个具有不同列大小的数据框,其中四个列在两个数据框中可以具有相同的值。 I want to make a new column in df1, that takes the value 1 if there is a row in df2 that has the same values for column 'A','B','C', and 'D' as a row in df1. 我想在df1中创建一个新列,如果df2中有一行与df1中的行具有相同值的列'A','B','C'和'D',则该列的值为1 。 If there isn't such a row, I want the value to be 0. Rows 'E' and 'F' are not important for checking the values. 如果没有这样的行,我希望该值为0。行“ E”和“ F”对于检查值并不重要。

Is there a pandas function that can do this, or do I have to this in a loop. 是否有一个熊猫函数可以做到这一点,或者我必须在循环中做到这一点。

For example: 例如:

df1 =
A    B    C    D    E    F
1    1    20   20   3    2
1    1    12   14   1    3
2    1    13   43   4    3
2    2    12   34   1    4

df2 =
A    B    C    D    E    
1    3    12   14   2    
1    1    20   20   4   
2    2    21   31   5    
2    2    12   34   8    

expected output: 预期输出:

df1 =
A    B    C    D    E    F    Target
1    1    20   20   3    2    1
1    1    12   14   1    3    0
2    1    13   43   4    3    0
2    2    12   34   1    4    1

This is fairly simple. 这很简单。 If you check whether two DataFrames are equal, it checks if each element is equal to the respective element. 如果检查两个DataFrame是否相等,则检查每个元素是否等于各自的元素。

col_list = ['A', 'B', 'C', 'D']
idx = (df1.loc[:,  col_list] == df2.loc[:,  col_list]).all(axis=1)

df1['new_row'] = idx.astype(int)

I think you need merge with left join and parameter indicator=True , then compare column _merge with eq (same as == ) and last convert boolean True and False to 1 and 0 by astype : 我认为您需要merge left join和参数indicator=True ,然后将_merge列与eq (与==相同)进行比较,最后通过astype将布尔值TrueFalse转换为10

cols = list('ABCD')
df1['Target'] = pd.merge(df1[cols], 
                      df2[cols], how='left', indicator=True)['_merge'].eq('both').astype(int)
print (df1)

   A  B   C   D  E  F  Target
0  1  1  20  20  3  2       1
1  1  1  12  14  1  3       0
2  2  1  13  43  4  3       0
3  2  2  12  34  1  4       1

Detail: 详情:

print (pd.merge(df1[cols], df2[cols], how='left', indicator=True))
   A  B   C   D     _merge
0  1  1  20  20       both
1  1  1  12  14  left_only
2  2  1  13  43  left_only
3  2  2  12  34       both

You can use logical operators for that. 您可以为此使用逻辑运算符。 You can have a look at Logic operator for boolean indexing in Pandas or Element-wise logical OR in Pandas for some ideas. 您可以看一下在Pandas中用于布尔索引的Logic运算符, 或在Pandas中用于Element-wise逻辑或的 逻辑运算符

But your specification does not suffice for providing a solution sketch because I do not know how the rows in df1 should work with df2. 但是您的规范不足以提供解决方案草图,因为我不知道df1中的行应如何与df2一起使用。 Is it that the number of rows are the same and each row in df1 should have the column with the boolean value for that in df2 in the same row A, B, C, and D are the same? 是行数是否相同,并且df1中的每一行都应具有同一行中A,B,C和D的df2中具有布尔值的列?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 根据 pandas DataFrame 的列中的值序列查找行的索引 - Finding the index of rows based on a sequence of values in a column of pandas DataFrame Pandas Dataframe - 按列值过滤 dataframe 行 - Pandas Dataframe - Filtering dataframe rows by column values 查找 pandas Dataframe 列的唯一行,其中第二列的所有值都是 NaN - finding unique rows of a pandas Dataframe column for which all the values of a second column are NaN 将 DataFrame 中某些列和行的值替换为同一 dataframe 和 Pandas 中的另一列的值 - Replace values of certain column and rows in a DataFrame with the value of another column in the same dataframe with Pandas 来自 Pandas DataFrame 的 Select 行与另一个 DataFrame 中的列值完全相同 - Select rows from a Pandas DataFrame with exactly the same column values in another DataFrame 按多个列值过滤pandas数据帧行 - Filter pandas dataframe rows by multiple column values 按列值删除 Pandas DataFrame 中的行(文本) - Drop rows in Pandas DataFrame by Column values (text) Pandas DataFrame - 按多列值对行求和 - Pandas DataFrame - summing rows by multiple column values Pandas 数据框用列名映射行值 - Pandas dataframe map rows values with column names pandas dataframe 按作为列表的列的值过滤行 - pandas dataframe filter rows by values of column that is a list
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM