简体   繁体   中英

Spelling check for a dataframe column in python

I have a large dataframe consisting of around 10,000 rows of user-inputted data, which have typos. Theres a column with job titles and I would like to search for specific titles, but because of the spelling mistakes I cannot seem to get all the data I need.

Currently what I have is: titles = [vet, doctor, teacher]

for title in titles: targetInfo =[df['jobtitles'].str.contains(title, na=False, case=False)]

Any ideas on how to account for spelling mistakes?

I guess you could use the unique function in pandas eg:

df["titles"].unique() 

this might help to find the unique values resulting in the ones containing typos then in excel sheet you can replace those with your preferred ones.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM