I'm searching for some advice.
I have a data frame with 67,000 records. The problem I have is that my "Country" column was previously a free field (now it is a dropdown selection) so there various values for similar countries. For example there is DE, Germany, Alemania etc... which means that I cannot just take the first 2 values for a string because in the example above, German sales will be moved into Georgia.
I was wondering if anyone has had experience with this problem before and has a solution? I'm thinking I should change all the strings with >2 characters to "unlisted" and carry out a separate analysis there. I am not too sure how to go about to selection of the bad cells.
Would this be done with regex? or with a or a df.query?
Thanks in advance!
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.