简体   繁体   English

不区分大小写的多个列表到列比较

[英]Case-insensitive multiple list to column comparison

data = [
    ["Automotive", "Education", "Enterance", "Commercial"],
    ["Gas", "Hotels", "Access", " "],
    ["Healthcare", " ", "Video System", "Reseller"],
]
df_test = pd.DataFrame(
    data, columns=["Industry_US", "Industry_EU", "System Type", "Account Type"]
)

valid = {
    "Industry_US": ["Automotive", "Retail", "Gas", "Other"],
    "Industry_EU": ["Real Estate", "Transport", "Mining"],
    "System Type": ["Access", "Video System"],
    "Account Type": ["Commercial", "Reseller", "Small"],
}

mask = df_test.apply(lambda c: c.isin(valid[c.name]))

for i, v in df_test.mask(mask | df_test.eq(" ")).stack().iteritems():
    print(f'error found in row "{i[0]}", column "{i[1]}": "{v}" is invalid')

I would like to make this list to column comparison case insensitive.我想让这个列表对列比较不区分大小写。 I have tried:我努力了:

valid= [x.lower() for x in valid]

When I run the code above I get "TypeError: list indices must be integers or slices, not str" .当我运行上面的代码时,我得到"TypeError: list indices must be integers or slices, not str"

Is there a way to make this comparison list insensitive without changing all the values to lower or upper?有没有办法在不将所有值更改为较低或较高的情况下使此比较列表不敏感?

You can make a case-insensitive comparison without modifying df_test nor valid by replacing this line in your code:您可以在不修改df_test的情况下进行不区分大小写的比较,也可以通过在代码中替换此行来进行valid

mask = df_test.apply(lambda c: c.isin(valid[c.name]))

with

mask = df_test.apply(
    lambda c: c.str.lower().isin(
        {key: [x.lower() for x in value] for key, value in valid.items()}[c.name]
    )
)

Then, if, for instance, you replace Automotive with automotive in valid , running your code will not print an error for this word:然后,例如,如果您在valid中将Automotive替换为automotive ,则运行您的代码不会为该单词打印错误:

error found in row "0", column "Industry_EU": "Education" is invalid
error found in row "0", column "System Type": "Enterance" is invalid
error found in row "1", column "Industry_EU": "Hotels" is invalid
error found in row "2", column "Industry_US": "Healthcare" is invalid

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM