简体   繁体   English

与Pandas中的布尔值进行无效类型比较

[英]invalid type comparison with Booleans in Pandas

Trying to clean Country (Ctry) column in pandas dataframe (origin) based on other row level data, or other dataframes with similar data. 尝试根据其他行级数据或其他具有类似数据的数据框来清理熊猫数据框(来源)中的“国家/地区”列。 See links for example data frames. 请参阅链接,例如数据帧。

It will eventually feed two new columns in the dataframe giving correctly formatted country and a data quality "score". 最终它将在数据框中提供两个新列,以提供格式正确的国家/地区和数据质量“得分”。

Origin Dataframe Nafta, Countries, and States DataFrames 原始数据框架 Nafta,国家和州数据框架

The function works on values that are in the lookup tables, or blanks, but when I pass "bad" data in, it gives a invalid type comparison. 该函数适用于查找表或空白中的值,但是当我传入“不良”数据时,它给出了无效的类型比较。 Testing this separately returns a boolean and works: 分别进行测试将返回一个布尔值并起作用:

Nafta.loc[Nafta[col] == a].empty .

Not sure why this doesn't work. 不知道为什么这行不通。 I've tested the values, and its Boolean to Boolan. 我已经测试了值及其对Boolan的布尔值。 See custom function and lambda. 请参阅自定义函数和lambda。

def CountryScore(a,b,c): 
    if pd.isnull(a):
        score = "blank"
        if pd.notnull(b):
            for col in States:
                if States.loc[States[col]== b].empty != True:
                    corfor = States.iloc[States.loc[States[col] == b].index[-1],2]
                    break
                else:
                    corfor = "Bad Data"
                    continue
        elif pd.notnull(c):
            if (len(str(c).strip()) <= 5) or (len(str(c).strip()) > 9):
                corfor = "USA"
            else:
                corfor = "CAN"
        else:
            corfor = "Bad Data"
    else:
        for col in Nafta:
            if Nafta.loc[Nafta[col] == a].empty != True:
                score = "good" 
                corfor = Nafta.iloc[Nafta.loc[Nafta[col] == a].index[-1],1]
                break
            else:
                score = "pending"
                continue
    if  "pending" == score:
        for col in Country:
            if Country.loc[Country[col]== a].empty != True:
                score = "good"
                corfor = Country.iloc[Country.loc[Country[col] == a].index[-1],2]
                break
            else:
                score = "bad"
                corfor = "Bad Data"
                continue
    return score, corfor

origin["Origin Ctry Score"] , origin["Origin Ctry Format"] = zip(*origin.apply(lambda x: CountryScore(x["Origin Ctry"], x["Origin State"], x["Origin Zip"]), axis = 1))

Assume dataframes are loaded already. 假设数据帧已经加载。 Thanks!!! 谢谢!!!

I was able to find my mistake. 我能够找到我的错误。 In the last column of Country, i compare a integer to string. 在“国家/地区”的最后一列中,我将整数与字符串进行比较。 Had nothing to do with Boolean. 与布尔值无关。 Fixed with: 固定于:

Country.loc[Country[col].astype(str)== a].empty != True

I will end up wrapping most in this type of transformation. 最后,我将总结这种类型的转换。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM