简体   繁体   English

如何检查 pandas dataframe 中的列之间的冲突?

[英]How do I check for conflict between columns in a pandas dataframe?

I'm working on a Dataframe which contains multiple possible values from three different sources for a single item, which is in the index, such as:我正在研究 Dataframe ,其中包含来自三个不同来源的单个项目的多个可能值,该项目位于索引中,例如:

import pandas as pd
import numpy as np

inp = [
    {"Item": "Item1", "Local A": np.nan, "Local B": 6, "Local C": 5},
    {"Item": "Item2", "Local A": 6, "Local B": 7, "Local C": 5},
    {"Item": "Item3", "Local A": np.nan, "Local B": np.nan, "Local C": 5},
    {"Item": "Item4", "Local A": 5, "Local B": 5, "Local C": 5},
    {"Item": "Item5", "Local A": 5, "Local B": np.nan, "Local C": 5},
]
df = pd.DataFrame(inp)
print(df)

Output: Output:

    Item  Local A  Local B  Local C
0  Item1      NaN      6.0        5
1  Item2      6.0      7.0        5
2  Item3      NaN      NaN        5
3  Item4      5.0      5.0        5
4  Item5      5.0      NaN        5

My goal is to create a column which specifies if there is conflict between sources when there are multiple non-null values for an index (some cells are empty).我的目标是创建一个列,指定当索引有多个非空值(某些单元格为空)时源之间是否存在冲突。

Ideal Output:理想 Output:

    Item  Local A  Local B  Local C Conflict
0  Item1      NaN      6.0        5      yes
1  Item2      6.0      7.0        5      yes
2  Item3      NaN      NaN        5      NaN
3  Item4      5.0      5.0        5      NaN
4  Item5      5.0      NaN        5      NaN

In order to do that I decided to build a filter that checks if the three sources are non-null and if they are different.为了做到这一点,我决定构建一个过滤器来检查三个源是否为非空以及它们是否不同。

I built the filters for the three other cases consisting of two values being available for an index.我为其他三种情况构建了过滤器,其中包括可用于索引的两个值。

condition1 = (
    df["Local A"].notnull() & df["Local B"].notnull() & df["Local C"].notnull()
) & ~(df["Local A"] == df["Local B"] == df["Local C"])

condition2 = (df["Local A"].notnull() & df["Local B"].notnull()) & ~(
    df["Local A"] == df["Local B"]
)

condition3 = (df["Local B"].notnull() & df["Local C"].notnull()) & ~(
    df["Local B"] == df["Local C"]
)

condition4 = (df["Local A"].notnull() & df["Local C"].notnull()) & ~(
    df["Local A"] == df["Local C"]
)


df.loc[condition1 | condition2 | condition3 | condition4, "Conflict"] = "yes"

This solution of enumerating the different possible outcomes is not very elegant but I wasn't able to find a simpler alternative.这种枚举不同可能结果的解决方案不是很优雅,但我无法找到更简单的替代方案。 Moreover, I get the following error while running the script:此外,我在运行脚本时收到以下错误:

ValueError: The truth value of a Series is ambiguous. ValueError:Series 的真值不明确。 Use a.empty, a.bool(), a.item(), a.any() or a.all().使用 a.empty、a.bool()、a.item()、a.any() 或 a.all()。

I've seen this a few times and was able to find the cause, but I just can't figure this one out.我已经看过几次并且能够找到原因,但我就是无法弄清楚这一点。 It seems that I'm comparing Bool series instead of individual cases like I want to.似乎我正在比较 Bool 系列,而不是像我想要的那样比较个别情况。

IIUC, try: IIUC,尝试:

df['Conflict'] = np.where((df.iloc[:, 1:].nunique(axis=1) != 1),'Yes',np.nan)

Output: Output:

    Item  Local A  Local B  Local C Conflict
0  Item1      NaN      6.0        5      Yes
1  Item2      6.0      7.0        5      Yes
2  Item3      NaN      NaN        5      nan
3  Item4      5.0      5.0        5      nan
4  Item5      5.0      NaN        5      nan

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 熊猫:如何检查同一数据框中各列之间的值匹配? - Pandas: How do I check for value match between columns in same dataframe? 如何检查 1 个数据帧中的列中的整数值是否存在于第 2 个数据帧中 2 列之间的范围拆分中? - How do I check for an integer value in a column in 1 dataframe to exist in a range split between 2 columns in 2nd dataframe? 在pandas数据框中,如何检查同一行但不同列中是否存在两个字符串? - In a pandas dataframe, how do I check if two strings exist on same row but in different columns? 如何在 pandas dataframe 上执行 if 语句来检查多个列的特定值? - How can I do an if statement on a pandas dataframe to check multiple columns for specific values? 如何计算两个 Pandas DataFrame 列之间的 Levenshtein 距离? - How do I calculate the Levenshtein distance between two Pandas DataFrame columns? 熊猫:如何在熊猫的DataFrame中汇总*一些*列* - Pandas: How do I aggregate *some* of the *columns* in a Pandas' DataFrame 如何使用熊猫检查日期列中的日期是否在不同列中的两个日期之间? - How do I check if a date in a date column is between two dates in different columns using pandas? 如何在Pandas中的数据框中组合两列? - How do I combine two columns within a dataframe in Pandas? 如何获取 Pandas 数据框中的列百分比? - How do I get the percentage of columns in a Pandas dataframe? 如何旋转pandas DataFrame,然后添加层次结构列? - How do I pivot a pandas DataFrame and then add hierarchical columns?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM