简体   繁体   中英

Special Characters - Sql

How do i get special characters in column on SqlServer ?

I got e-mail list , and i have to find special caracteres like example bellow

**Email** 
JóhnSnow@gmail.com
Khãlessi@gmail.com 

As u see above , there's ' ~ ' and ' ´ 'as special characters . Might be appear others characters like ' .. ' or other else.

Im working on Sql Server 2012 ,

Anyone has suggestion to solve it ?

To extract the special characters you would first need to split your string into rows, so you can query each individually, which you can do with a numbers table. If you don't have one, they are very easy to create on the fly:

WITH N1 AS (SELECT N FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) n (N)),
N2 (N) AS (SELECT 1 FROM N1 AS N1 CROSS JOIN N1 AS N2),
N3 (N) AS (SELECT 1 FROM N2 AS N1 CROSS JOIN N2 AS N2),
Numbers (Number) AS (SELECT ROW_NUMBER() OVER(ORDER BY N) FROM N3)
SELECT  Number
FROM    Numbers;

This gives a list of numbers from 1-10000. More on this here .

Then you can join this to your data with the condition Number < LEN(Email) to ensure you get one row back for each character in the email, then use SUBSTRING() to extract the character at the position n :

DECLARE @T TABLE (ID INT IDENTITY, Email NVARCHAR(255));
INSERT @T (Email)
VALUES (N'JóhnSnów@gmail.com'), (N'Khãlessi@gmail.com'), ('NedStark@gmail.com');

WITH N1 AS (SELECT N FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) n (N)),
N2 (N) AS (SELECT 1 FROM N1 AS N1 CROSS JOIN N1 AS N2),
N3 (N) AS (SELECT 1 FROM N2 AS N1 CROSS JOIN N2 AS N2),
Numbers (Number) AS (SELECT ROW_NUMBER() OVER(ORDER BY N) FROM N3)
SELECT  t.ID, 
        t.Email, 
        Character = SUBSTRING(t.Email, n.Number, 1)
FROM    @T AS t
        INNER JOIN Numbers n    
            ON n.Number < LEN(t.Email)
ORDER BY t.ID;

Which gives:

ID  Email                   Character
-----------------------------
1   JóhnSnow@gmail.com      J
1   JóhnSnow@gmail.com      ó
1   JóhnSnow@gmail.com      h
1   JóhnSnow@gmail.com      n
1   JóhnSnow@gmail.com      S
1   JóhnSnow@gmail.com      n
1   JóhnSnow@gmail.com      ó
1   JóhnSnow@gmail.com      w
.....

Then you can extract the special characters by converting them to VARCHAR with the collation SQL_Latin1_General_Cp1251_CS_AS , and checking that to the original:

DECLARE @T TABLE (ID INT IDENTITY, Email NVARCHAR(255));
INSERT @T (Email)
VALUES (N'JóhnSnów@gmail.com'), (N'Khãlessi@gmail.com'), ('NedStark@gmail.com');

WITH N1 AS (SELECT N FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) n (N)),
N2 (N) AS (SELECT 1 FROM N1 AS N1 CROSS JOIN N1 AS N2),
N3 (N) AS (SELECT 1 FROM N2 AS N1 CROSS JOIN N2 AS N2),
Numbers (Number) AS (SELECT ROW_NUMBER() OVER(ORDER BY N) FROM N3),
AllCharacters as
(   SELECT  t.ID,  
            t.Email, 
            Character = SUBSTRING(t.Email, n.Number, 1), 
            Position = n.Number
    FROM    @T AS t
            INNER JOIN Numbers n    
                ON n.Number < LEN(t.Email)
)
SELECT  ac.ID, ac.Character, ac.Position
FROM    AllCharacters AS ac
WHERE   CONVERT(CHAR(1), ac.Character) COLLATE SQL_Latin1_General_Cp1251_CS_AS <> ac.Character
ORDER BY ac.ID;

Result

ID  Email                   Character   Position
----------------------------------------------------
1   JóhnSnów@gmail.com          ó           2
1   JóhnSnów@gmail.com          ó           7
2   Khãlessi@gmail.com          ã           3

Then finally, if required you can use XML extensions to concatenate these characters into a single column:

DECLARE @T TABLE (ID INT IDENTITY, Email NVARCHAR(255));
INSERT @T (Email)
VALUES (N'JóhnSnów@gmail.com'), (N'Khãlessi@gmail.com'), ('NedStark@gmail.com');

WITH N1 AS (SELECT N FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) n (N)),
N2 (N) AS (SELECT 1 FROM N1 AS N1 CROSS JOIN N1 AS N2),
N3 (N) AS (SELECT 1 FROM N2 AS N1 CROSS JOIN N2 AS N2),
Numbers (Number) AS (SELECT ROW_NUMBER() OVER(ORDER BY N) FROM N3),
AllCharacters as
(   SELECT  t.ID,  
            t.Email, 
            Character = SUBSTRING(t.Email, n.Number, 1), 
            Position = n.Number
    FROM    @T AS t
            INNER JOIN Numbers n    
                ON n.Number < LEN(t.Email)
), SpecialCharacters AS
(   SELECT  ac.ID, ac.Character, ac.Position
    FROM    AllCharacters AS ac
    WHERE   CONVERT(CHAR(1), ac.Character) COLLATE SQL_Latin1_General_Cp1251_CS_AS <> ac.Character
)
SELECT  t.ID,
        t.Email,
        SpecialCharacters = ISNULL(STUFF(s.SpecialCharacterList.value('.', 'NVARCHAR(255)'), 1, 2, ''), '')
FROM    @T AS T
        CROSS APPLY
        (   SELECT  CONCAT(N', ', s.Character, '(', Position, ')')
            FROM    SpecialCharacters AS s
            WHERE   s.ID = t.ID
            ORDER BY Position
            FOR XML PATH(''), TYPE
        ) s (SpecialCharacterList)
ORDER BY ID;

Result

ID  Email                   SpecialCharacters
------------------------------------------------
1   JóhnSnów@gmail.com      ó(2), ó(7)
2   Khãlessi@gmail.com      ã(3)
3   NedStark@gmail.com  

As an aside, it may be better suited to your needs to store in a table what you count as special characters rather than relying on the code pages for specific collations, if you were to do this, you would just need to change this line:

WHERE   CONVERT(CHAR(1), ac.Character) COLLATE SQL_Latin1_General_Cp1251_CS_AS <> ac.Character

For:

WHERE EXISTS (SELECT 1 FROM MySpecialCharacterTable AS sct WHERE sct.Character = ac.Character)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM