简体   繁体   中英

SQL Server 2008, searching for special characters

I'm trying to locate some bad data that has been inserted into a table. Likely by someone doing a copy/paste from Word then inserting into the database.

I have seen the similar questions like Query for finding rows with special characters

but this doesn't quite work for what I'm needing. Essentially I want to only return back a data set not including any standard characters and catch things such as an endash (just one example).

I have tried using something like this

SELECT * FROM mytable WHERE email LIKE '%[^0-9a-zA-Z \-@\.]%'

but it returns back every single single record.


EDIT

In case it is of benefit for anyone else that comes along later. Ultimately the issue I was having was due to the placement of the hyphen (-) character as was also noted by sgmoore below. I moved this to the beginning of my range just following the not (^) character.

Also, based on the info provided by gbn that LIKE is not actually using regexes I revisited the Microsoft documentation here SQL Server LIKE Statement . I was using the backslash character unnecessarily as an escape character due to my assumption it was the same as a regex. These were unnecessary, and apparently escape characters are only needed with wildcard characters. The doc I linked also mentions using an ESCAPE clause following the LIKE range to specify what character is to be used as an escape character eg WHERE percent_complete LIKE '%50!%' ESCAPE '!' would match a string that actually ends in 50% (50%, 150%).

Here is what I ended up using to screen my email data for bad characters; for me it works, but it may not be complete for all cases.

SELECT * FROM mytable WHERE email LIKE '%[^-0-9a-zA-Z_@.]%'

also if it is helpful, I needed to do something similar on a couple of other generic text fields; this far from comprehensive, but it narrowed my result set down to just a handful of records that I was then able to visually determine what I was looking for.

SELECT * from mytable WHERE text_field LIKE '%[^-0-9a-zA-Z @.''?:/,+&();_]%'

Try

SELECT * FROM mytable WHERE email LIKE '%[^0-9a-zA-Z @\.\-]%'

It would look like the position of the - sign on your version is causing problems.

Use double negatives

... WHERE email NOT LIKE '%[^0-9a-zA-Z ,-@\.]%'

Sample data would be useful too

Presumably, every email has a @ character as well as . . You might try:

SELECT * FROM mytable WHERE email LIKE '%[^0-9a-zA-Z ,\]%'

If your original list is what you really want, then you need to escape - :

SELECT * FROM mytable WHERE email LIKE '%[^0-9a-zA-Z ,\-@\.]%'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM