data = [
["Automotive", "Education", "Enterance", "Commercial"],
["Gas", "Hotels", "Access", " "],
["Healthcare", " ", "Video System", "Reseller"],
]
df_test = pd.DataFrame(
data, columns=["Industry_US", "Industry_EU", "System Type", "Account Type"]
)
valid = {
"Industry_US": ["Automotive", "Retail", "Gas", "Other"],
"Industry_EU": ["Real Estate", "Transport", "Mining"],
"System Type": ["Access", "Video System"],
"Account Type": ["Commercial", "Reseller", "Small"],
}
mask = df_test.apply(lambda c: c.isin(valid[c.name]))
for i, v in df_test.mask(mask | df_test.eq(" ")).stack().iteritems():
print(f'error found in row "{i[0]}", column "{i[1]}": "{v}" is invalid')
I would like to make this list to column comparison case insensitive. I have tried:
valid= [x.lower() for x in valid]
When I run the code above I get "TypeError: list indices must be integers or slices, not str" .
Is there a way to make this comparison list insensitive without changing all the values to lower or upper?
You can make a case-insensitive comparison without modifying df_test
nor valid
by replacing this line in your code:
mask = df_test.apply(lambda c: c.isin(valid[c.name]))
with
mask = df_test.apply(
lambda c: c.str.lower().isin(
{key: [x.lower() for x in value] for key, value in valid.items()}[c.name]
)
)
Then, if, for instance, you replace Automotive
with automotive
in valid
, running your code will not print an error for this word:
error found in row "0", column "Industry_EU": "Education" is invalid
error found in row "0", column "System Type": "Enterance" is invalid
error found in row "1", column "Industry_EU": "Hotels" is invalid
error found in row "2", column "Industry_US": "Healthcare" is invalid
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.