I'm trying to locate some bad data that has been inserted into a table. Likely by someone doing a copy/paste from Word then inserting into the database.
I have seen the similar questions like Query for finding rows with special characters
but this doesn't quite work for what I'm needing. Essentially I want to only return back a data set not including any standard characters and catch things such as an endash (just one example).
I have tried using something like this
SELECT * FROM mytable WHERE email LIKE '%[^0-9a-zA-Z \-@\.]%'
but it returns back every single single record.
In case it is of benefit for anyone else that comes along later. Ultimately the issue I was having was due to the placement of the hyphen (-) character as was also noted by sgmoore below. I moved this to the beginning of my range just following the not (^) character.
Also, based on the info provided by gbn that LIKE is not actually using regexes I revisited the Microsoft documentation here SQL Server LIKE Statement . I was using the backslash character unnecessarily as an escape character due to my assumption it was the same as a regex. These were unnecessary, and apparently escape characters are only needed with wildcard characters. The doc I linked also mentions using an ESCAPE clause following the LIKE range to specify what character is to be used as an escape character eg WHERE percent_complete LIKE '%50!%' ESCAPE '!' would match a string that actually ends in 50% (50%, 150%).
Here is what I ended up using to screen my email data for bad characters; for me it works, but it may not be complete for all cases.
SELECT * FROM mytable WHERE email LIKE '%[^-0-9a-zA-Z_@.]%'
also if it is helpful, I needed to do something similar on a couple of other generic text fields; this far from comprehensive, but it narrowed my result set down to just a handful of records that I was then able to visually determine what I was looking for.
SELECT * from mytable WHERE text_field LIKE '%[^-0-9a-zA-Z @.''?:/,+&();_]%'
Try
SELECT * FROM mytable WHERE email LIKE '%[^0-9a-zA-Z @\.\-]%'
It would look like the position of the - sign on your version is causing problems.
Use double negatives
... WHERE email NOT LIKE '%[^0-9a-zA-Z ,-@\.]%'
Sample data would be useful too
Presumably, every email has a @
character as well as .
. You might try:
SELECT * FROM mytable WHERE email LIKE '%[^0-9a-zA-Z ,\]%'
If your original list is what you really want, then you need to escape -
:
SELECT * FROM mytable WHERE email LIKE '%[^0-9a-zA-Z ,\-@\.]%'
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.