[英]Case-insensitive multiple list to column comparison
data = [
["Automotive", "Education", "Enterance", "Commercial"],
["Gas", "Hotels", "Access", " "],
["Healthcare", " ", "Video System", "Reseller"],
]
df_test = pd.DataFrame(
data, columns=["Industry_US", "Industry_EU", "System Type", "Account Type"]
)
valid = {
"Industry_US": ["Automotive", "Retail", "Gas", "Other"],
"Industry_EU": ["Real Estate", "Transport", "Mining"],
"System Type": ["Access", "Video System"],
"Account Type": ["Commercial", "Reseller", "Small"],
}
mask = df_test.apply(lambda c: c.isin(valid[c.name]))
for i, v in df_test.mask(mask | df_test.eq(" ")).stack().iteritems():
print(f'error found in row "{i[0]}", column "{i[1]}": "{v}" is invalid')
我想讓這個列表對列比較不區分大小寫。 我努力了:
valid= [x.lower() for x in valid]
當我運行上面的代碼時,我得到"TypeError: list indices must be integers or slices, not str" 。
有沒有辦法在不將所有值更改為較低或較高的情況下使此比較列表不敏感?
您可以在不修改df_test
的情況下進行不區分大小寫的比較,也可以通過在代碼中替換此行來進行valid
:
mask = df_test.apply(lambda c: c.isin(valid[c.name]))
和
mask = df_test.apply(
lambda c: c.str.lower().isin(
{key: [x.lower() for x in value] for key, value in valid.items()}[c.name]
)
)
然后,例如,如果您在valid
中將Automotive
替換為automotive
,則運行您的代碼不會為該單詞打印錯誤:
error found in row "0", column "Industry_EU": "Education" is invalid
error found in row "0", column "System Type": "Enterance" is invalid
error found in row "1", column "Industry_EU": "Hotels" is invalid
error found in row "2", column "Industry_US": "Healthcare" is invalid
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.