简体   繁体   中英

Identify Hidden Characters

In my SQL tables I have text which has hidden characters which is only visible when I copy and paste it in notepad++.

How to find those rows which has hidden characters using SQL Server queries?

I have tried comparing the lengths using datalength and len it did not work.

DATALENGTH(name) AS BinaryLength != LEN(name)

I want the row which has hidden characters.

On the assumption that this is being caused by control characters. Some of which are invisible. But also include tabs, newlines and spaces. An example to illustrate and how to get them to appear.

--DROP TABLE #SillyTemp

DECLARE @InvisibleChar1 NCHAR(1) = NCHAR(28), @InvisibleChar2 NCHAR(1) = NCHAR(30), @NonControlChar NCHAR(1) = NCHAR(33);
DECLARE @InputString NVARCHAR(500) = N'Some |' + @InvisibleChar1 +'| random string |' + @InvisibleChar2 + '|' + '; Thank god Finally a normal character |' + @NonControlChar + '|'; 
SELECT @InputString AS OhNoInvisibleCharacters

DECLARE @ControlCharRange NVARCHAR(50) = N'%[' + NCHAR(1) + '-' + NCHAR(31) + ']%';

CREATE TABLE #SillyTemp
(
    input nvarchar(500)
)

INSERT INTO #SillyTemp(input)
VALUES (@InputString),(N'A normal string')

SELECT @ControlCharRange;
SELECT input FROM #SillyTemp AS #SI WHERE input LIKE @ControlCharRange;

This produces 3 results. A string with invisiblechars within them like such:

Some || random string ||; Thank god Finally a normal character |!|

Note, the are actually invisible inside SQL. But stackoverflow shows them as such. The output in SQL Server is simply.

Some || random string ||; Thank god Finally a normal character |!|

But these characters still have a corresponding (N)CHAR(X) value. (N)CHAR(0) is a NULL character and is highly unlikely to be in a string, in my setup to detect them it also provides some problems in building a range. (N)CHAR(32) is the ' ' space character.

The way the [XY] string operator works is also based on the (N)CHAR numbers. Therefore we can make a range of [NCHAR(1)-NCHAR(31)]

The last select goes through the temporary table, one which has invisible characters. Since we're looking for any NCHARS between 1 and 31, only those with invisible characters (and often invalid characters or tabs/newlines) satisfy the where condition. Thus only they get returned. In this case only the 'faulty' string gets returned in my select statement.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM