使用 Pandas 根据多个条件选择数据

Question

I am new to using Pandas.我是使用 Pandas 的新手。 I want to select rows from a dataframe where multiple columns match in value.我想从多列值匹配的数据框中选择行。 Along the lines of:沿着以下路线：

if column A equals column AB and column B equals column BC如果 A 列等于 AB 列且 B 列等于 BC 列

then I want those values.那么我想要这些值。

I haven't actually used an if statement, I read iteration was not good to use with pandas.我实际上并没有使用 if 语句，我读到迭代不适用于熊猫。

I've tried to find a solution, I'm not sure if it is my syntax or if its unhappy with different data types of the columns?我试图找到一个解决方案，我不确定这是我的语法还是它对列的不同数据类型不满意？

My code is a little long, so I'll provided just the line where I attempt the selection but I can post the entire code if that is helpful.我的代码有点长，所以我只提供我尝试选择的行，但如果有帮助，我可以发布整个代码。

dfequal=dfMerged.loc[(dfMerged['MetCode']==dfMerged['GCD_METCODE']) & (dfMerged[dfMerged['Zone Code']==dfMerged['GCD_Senior_ZONE']]) & (dfMerged[dfMerged['Municipality Code']==dfMerged['GCD_CSDUID']])]

Edit*编辑*

The expected output would be a dataframe where only rows where the statement is true would exist.预期的输出将是一个数据帧，其中只有语句为真的行才会存在。

This is the error:这是错误：
ValueError: operands could not be broadcast together with shapes (84778,) (4462,) ValueError: 操作数无法与形状一起广播 (84778,) (4462,)

This is my data table i'm pulling from这是我从中提取的数据表

Sample Data样本数据

    FileID,MetCode,Municipality Code,Zone 
Code,GCD_Senior_ZONE,GCD_METCODE,GCD_CSDUID
A100101,7175,1005018,303006,303006,7175,1005018
A100102,7175,1005018,303006,303006,7175,1005018
A100103,7175,1005018,303006,303006,7175,1005018
A100104,7280,1006009,202003,202003,7280,1006009
A100105,7300,1006017,202003,202003,7300,1006017
A100108,7300,1006017,202003,202003,7300,1006017
A100109,7300,1006017,202003,202003,7300,1006017
A100110,1640,1001485,101001,101001,1640,1001485
A100111,1640,1001517,101001,101001,1640,1001517
A100114,9000,1008011,202003,202003,0,1008011
A100115,9000,1001370,101002,101002,0,1001370
A100119,9000,1003034,202003,202003,0,1003034

Answer 1

You'll simply need to add the conditions inside parenthesis inside your .loc and not repeat a DF filter inside the df filter:您只需要在.loc内的括号内添加条件，而不是在 df 过滤器内重复 DF 过滤器：

First, creating a crude datasample, as you didn't provide one besides the image:首先，创建一个粗略的数据样本，因为除了图像之外你没有提供一个：

# creating the values, first one will be ID, then next 4 will be the values to compare
check_values = [
    [1, 5, 10, 20, 30],
    [2, 5, 11, 32, 11],
    [3, 10, 10, 20, 20],
    [4, 9, 9, 11, 11],
    [5, 11, 23, 41, 11]
]

# creating columns names
check_cols = ['id', 'A', 'B', 'C', 'D']

# making the DataFrame
dfcheck = pd.DataFrame(check_values, columns=check_cols)

# Setting the id column, just because
dfcheck.set_index('id', inplace=True)

The solution , where you need to nest each condition inside parenthesis:解决方案，您需要将每个条件嵌套在括号内：

dfcheck.loc[(dfcheck['A'] == dfcheck['B']) & (dfcheck['C'] == dfcheck['D'])]

EDIT: What you missed/did wrong?:编辑：你错过了什么/做错了什么？：

Looking at your filter, you're adding unecessary dfMerged inside your parenthesis, your code broken in lines (delete everything inside "** CODE **"):看看你的过滤器，你在括号内添加了不必要的 dfMerged，你的代码被分成几行（删除“** CODE **”中的所有内容）：

dfequal=
dfMerged.loc[(dfMerged['MetCode']==dfMerged['GCD_METCODE']) 
& (**dfMerged[**dfMerged['Zone Code']==dfMerged['GCD_Senior_ZONE']**]**) 
& (**dfMerged[**dfMerged['Municipality Code']==dfMerged['GCD_CSDUID']**]**)]

So you see, that you're searching inside a search that it's not needed?所以你看，你在一个不需要的搜索中搜索？ It should be:它应该是：

dfequal=
dfMerged.loc[(dfMerged['MetCode']==dfMerged['GCD_METCODE']) 
& (dfMerged['Zone Code']==dfMerged['GCD_Senior_ZONE']) 
& (dfMerged['Municipality Code']==dfMerged['GCD_CSDUID'])]

Answer 2

Here is a working example 这是一个有效的例子

import pandas as pd
import random

a = random.sample([0,1]*5, 10)
b = random.sample([0,1]*5, 10)
ab = random.sample([0,1]*5, 10)
bc = random.sample([0,1]*5, 10)

df = pd.DataFrame({'A':a,'B':b, 'AB':ab,'BC':bc})
df

    A   B   AB  BC
0   0   1   1   0
1   1   0   0   1
2   0   1   0   0
3   1   0   1   1
4   0   1   1   0
5   0   0   1   1
6   1   1   0   0
7   1   0   0   0
8   0   0   0   1
9   1   1   1   1

df[(df['A']==df['AB']) & (df['B']==df['BC'])]

The output is a new dataframe with the observations that meet the established criteria 输出是一个新数据框，其观测值符合既定标准

Output: 输出：

    A   B   AB  BC
9   1   1   1   1

使用 Pandas 根据多个条件选择数据

问题描述

1 个解决方案

解决方案1
2 已采纳 2019-01-16 17:03:15

解决方案2
0 2019-01-16 17:03:02

使用 Pandas 根据多个条件选择数据

问题描述

1 个解决方案

解决方案1 2 已采纳 2019-01-16 17:03:15

解决方案2 0 2019-01-16 17:03:02

解决方案1
2 已采纳 2019-01-16 17:03:15

解决方案2
0 2019-01-16 17:03:02