如何检查 pandas dataframe 中的列之间的冲突？

Question

I'm working on a Dataframe which contains multiple possible values from three different sources for a single item, which is in the index, such as:我正在研究 Dataframe ，其中包含来自三个不同来源的单个项目的多个可能值，该项目位于索引中，例如：

import pandas as pd
import numpy as np

inp = [
    {"Item": "Item1", "Local A": np.nan, "Local B": 6, "Local C": 5},
    {"Item": "Item2", "Local A": 6, "Local B": 7, "Local C": 5},
    {"Item": "Item3", "Local A": np.nan, "Local B": np.nan, "Local C": 5},
    {"Item": "Item4", "Local A": 5, "Local B": 5, "Local C": 5},
    {"Item": "Item5", "Local A": 5, "Local B": np.nan, "Local C": 5},
]
df = pd.DataFrame(inp)
print(df)

Output: Output：

    Item  Local A  Local B  Local C
0  Item1      NaN      6.0        5
1  Item2      6.0      7.0        5
2  Item3      NaN      NaN        5
3  Item4      5.0      5.0        5
4  Item5      5.0      NaN        5

My goal is to create a column which specifies if there is conflict between sources when there are multiple non-null values for an index (some cells are empty).我的目标是创建一个列，指定当索引有多个非空值（某些单元格为空）时源之间是否存在冲突。

Ideal Output:理想 Output：

    Item  Local A  Local B  Local C Conflict
0  Item1      NaN      6.0        5      yes
1  Item2      6.0      7.0        5      yes
2  Item3      NaN      NaN        5      NaN
3  Item4      5.0      5.0        5      NaN
4  Item5      5.0      NaN        5      NaN

In order to do that I decided to build a filter that checks if the three sources are non-null and if they are different.为了做到这一点，我决定构建一个过滤器来检查三个源是否为非空以及它们是否不同。

I built the filters for the three other cases consisting of two values being available for an index.我为其他三种情况构建了过滤器，其中包括可用于索引的两个值。

condition1 = (
    df["Local A"].notnull() & df["Local B"].notnull() & df["Local C"].notnull()
) & ~(df["Local A"] == df["Local B"] == df["Local C"])

condition2 = (df["Local A"].notnull() & df["Local B"].notnull()) & ~(
    df["Local A"] == df["Local B"]
)

condition3 = (df["Local B"].notnull() & df["Local C"].notnull()) & ~(
    df["Local B"] == df["Local C"]
)

condition4 = (df["Local A"].notnull() & df["Local C"].notnull()) & ~(
    df["Local A"] == df["Local C"]
)


df.loc[condition1 | condition2 | condition3 | condition4, "Conflict"] = "yes"

This solution of enumerating the different possible outcomes is not very elegant but I wasn't able to find a simpler alternative.这种枚举不同可能结果的解决方案不是很优雅，但我无法找到更简单的替代方案。 Moreover, I get the following error while running the script:此外，我在运行脚本时收到以下错误：

ValueError: The truth value of a Series is ambiguous. ValueError：Series 的真值不明确。 Use a.empty, a.bool(), a.item(), a.any() or a.all().使用 a.empty、a.bool()、a.item()、a.any() 或 a.all()。

I've seen this a few times and was able to find the cause, but I just can't figure this one out.我已经看过几次并且能够找到原因，但我就是无法弄清楚这一点。 It seems that I'm comparing Bool series instead of individual cases like I want to.似乎我正在比较 Bool 系列，而不是像我想要的那样比较个别情况。

Answer 1

IIUC, try: IIUC，尝试：

df['Conflict'] = np.where((df.iloc[:, 1:].nunique(axis=1) != 1),'Yes',np.nan)

Output: Output：

    Item  Local A  Local B  Local C Conflict
0  Item1      NaN      6.0        5      Yes
1  Item2      6.0      7.0        5      Yes
2  Item3      NaN      NaN        5      nan
3  Item4      5.0      5.0        5      nan
4  Item5      5.0      NaN        5      nan

如何检查 pandas dataframe 中的列之间的冲突？

问题描述

1 个解决方案

解决方案1
1 已采纳 2021-06-15 13:39:53

如何检查 pandas dataframe 中的列之间的冲突？

问题描述

1 个解决方案

解决方案1 1 已采纳 2021-06-15 13:39:53

解决方案1
1 已采纳 2021-06-15 13:39:53