Case-insensitive multiple list to column comparison

Question

data = [
    ["Automotive", "Education", "Enterance", "Commercial"],
    ["Gas", "Hotels", "Access", " "],
    ["Healthcare", " ", "Video System", "Reseller"],
]
df_test = pd.DataFrame(
    data, columns=["Industry_US", "Industry_EU", "System Type", "Account Type"]
)

valid = {
    "Industry_US": ["Automotive", "Retail", "Gas", "Other"],
    "Industry_EU": ["Real Estate", "Transport", "Mining"],
    "System Type": ["Access", "Video System"],
    "Account Type": ["Commercial", "Reseller", "Small"],
}

mask = df_test.apply(lambda c: c.isin(valid[c.name]))

for i, v in df_test.mask(mask | df_test.eq(" ")).stack().iteritems():
    print(f'error found in row "{i[0]}", column "{i[1]}": "{v}" is invalid')

I would like to make this list to column comparison case insensitive. I have tried:

valid= [x.lower() for x in valid]

When I run the code above I get "TypeError: list indices must be integers or slices, not str" .

Is there a way to make this comparison list insensitive without changing all the values to lower or upper?

Answer 1

You can make a case-insensitive comparison without modifying df_test nor valid by replacing this line in your code:

mask = df_test.apply(lambda c: c.isin(valid[c.name]))

with

mask = df_test.apply(
    lambda c: c.str.lower().isin(
        {key: [x.lower() for x in value] for key, value in valid.items()}[c.name]
    )
)

Then, if, for instance, you replace Automotive with automotive in valid , running your code will not print an error for this word:

error found in row "0", column "Industry_EU": "Education" is invalid
error found in row "0", column "System Type": "Enterance" is invalid
error found in row "1", column "Industry_EU": "Hotels" is invalid
error found in row "2", column "Industry_US": "Healthcare" is invalid

Case-insensitive multiple list to column comparison

Question

1 answers

solution1
0 2022-08-29 15:37:09

Case-insensitive multiple list to column comparison

Question

1 answers

solution1 0 2022-08-29 15:37:09

solution1
0 2022-08-29 15:37:09