简体   繁体   English

特殊字符-SQL

[英]Special Characters - Sql

How do i get special characters in column on SqlServer ? 如何在SqlServer的列中获取特殊字符?

I got e-mail list , and i have to find special caracteres like example bellow 我收到了电子邮件列表,还必须找到特殊的角色,例如下面的波纹管

**Email** 
JóhnSnow@gmail.com
Khãlessi@gmail.com 

As u see above , there's ' ~ ' and ' ´ 'as special characters . 如您在上面看到的,有〜〜' ´作为特殊字符 Might be appear others characters like ' .. ' or other else. 可能会出现其他字符,例如“ .. ”或其他字符。

Im working on Sql Server 2012 , 我正在使用Sql Server 2012,

Anyone has suggestion to solve it ? 有人建议解决吗?

To extract the special characters you would first need to split your string into rows, so you can query each individually, which you can do with a numbers table. 要提取特殊字符,您首先需要将字符串分成几行,以便可以分别查询每个字符,您可以使用数字表进行查询。 If you don't have one, they are very easy to create on the fly: 如果您没有,则可以很容易地即时创建它们:

WITH N1 AS (SELECT N FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) n (N)),
N2 (N) AS (SELECT 1 FROM N1 AS N1 CROSS JOIN N1 AS N2),
N3 (N) AS (SELECT 1 FROM N2 AS N1 CROSS JOIN N2 AS N2),
Numbers (Number) AS (SELECT ROW_NUMBER() OVER(ORDER BY N) FROM N3)
SELECT  Number
FROM    Numbers;

This gives a list of numbers from 1-10000. 这给出了从1-10000的数字列表。 More on this here . 更多关于此这里

Then you can join this to your data with the condition Number < LEN(Email) to ensure you get one row back for each character in the email, then use SUBSTRING() to extract the character at the position n : 然后,您可以将其加入条件为Number < LEN(Email)以确保为电子邮件中的每个字符返回上一行,然后使用SUBSTRING()提取位置n处的字符:

DECLARE @T TABLE (ID INT IDENTITY, Email NVARCHAR(255));
INSERT @T (Email)
VALUES (N'JóhnSnów@gmail.com'), (N'Khãlessi@gmail.com'), ('NedStark@gmail.com');

WITH N1 AS (SELECT N FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) n (N)),
N2 (N) AS (SELECT 1 FROM N1 AS N1 CROSS JOIN N1 AS N2),
N3 (N) AS (SELECT 1 FROM N2 AS N1 CROSS JOIN N2 AS N2),
Numbers (Number) AS (SELECT ROW_NUMBER() OVER(ORDER BY N) FROM N3)
SELECT  t.ID, 
        t.Email, 
        Character = SUBSTRING(t.Email, n.Number, 1)
FROM    @T AS t
        INNER JOIN Numbers n    
            ON n.Number < LEN(t.Email)
ORDER BY t.ID;

Which gives: 这使:

ID  Email                   Character
-----------------------------
1   JóhnSnow@gmail.com      J
1   JóhnSnow@gmail.com      ó
1   JóhnSnow@gmail.com      h
1   JóhnSnow@gmail.com      n
1   JóhnSnow@gmail.com      S
1   JóhnSnow@gmail.com      n
1   JóhnSnow@gmail.com      ó
1   JóhnSnow@gmail.com      w
.....

Then you can extract the special characters by converting them to VARCHAR with the collation SQL_Latin1_General_Cp1251_CS_AS , and checking that to the original: 然后,您可以通过使用排序SQL_Latin1_General_Cp1251_CS_AS将特殊字符转换为VARCHAR并将其检查为原始字符来提取特殊字符:

DECLARE @T TABLE (ID INT IDENTITY, Email NVARCHAR(255));
INSERT @T (Email)
VALUES (N'JóhnSnów@gmail.com'), (N'Khãlessi@gmail.com'), ('NedStark@gmail.com');

WITH N1 AS (SELECT N FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) n (N)),
N2 (N) AS (SELECT 1 FROM N1 AS N1 CROSS JOIN N1 AS N2),
N3 (N) AS (SELECT 1 FROM N2 AS N1 CROSS JOIN N2 AS N2),
Numbers (Number) AS (SELECT ROW_NUMBER() OVER(ORDER BY N) FROM N3),
AllCharacters as
(   SELECT  t.ID,  
            t.Email, 
            Character = SUBSTRING(t.Email, n.Number, 1), 
            Position = n.Number
    FROM    @T AS t
            INNER JOIN Numbers n    
                ON n.Number < LEN(t.Email)
)
SELECT  ac.ID, ac.Character, ac.Position
FROM    AllCharacters AS ac
WHERE   CONVERT(CHAR(1), ac.Character) COLLATE SQL_Latin1_General_Cp1251_CS_AS <> ac.Character
ORDER BY ac.ID;

Result 结果

ID  Email                   Character   Position
----------------------------------------------------
1   JóhnSnów@gmail.com          ó           2
1   JóhnSnów@gmail.com          ó           7
2   Khãlessi@gmail.com          ã           3

Then finally, if required you can use XML extensions to concatenate these characters into a single column: 最后,如果需要,您可以使用XML扩展将这些字符连接到单个列中:

DECLARE @T TABLE (ID INT IDENTITY, Email NVARCHAR(255));
INSERT @T (Email)
VALUES (N'JóhnSnów@gmail.com'), (N'Khãlessi@gmail.com'), ('NedStark@gmail.com');

WITH N1 AS (SELECT N FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) n (N)),
N2 (N) AS (SELECT 1 FROM N1 AS N1 CROSS JOIN N1 AS N2),
N3 (N) AS (SELECT 1 FROM N2 AS N1 CROSS JOIN N2 AS N2),
Numbers (Number) AS (SELECT ROW_NUMBER() OVER(ORDER BY N) FROM N3),
AllCharacters as
(   SELECT  t.ID,  
            t.Email, 
            Character = SUBSTRING(t.Email, n.Number, 1), 
            Position = n.Number
    FROM    @T AS t
            INNER JOIN Numbers n    
                ON n.Number < LEN(t.Email)
), SpecialCharacters AS
(   SELECT  ac.ID, ac.Character, ac.Position
    FROM    AllCharacters AS ac
    WHERE   CONVERT(CHAR(1), ac.Character) COLLATE SQL_Latin1_General_Cp1251_CS_AS <> ac.Character
)
SELECT  t.ID,
        t.Email,
        SpecialCharacters = ISNULL(STUFF(s.SpecialCharacterList.value('.', 'NVARCHAR(255)'), 1, 2, ''), '')
FROM    @T AS T
        CROSS APPLY
        (   SELECT  CONCAT(N', ', s.Character, '(', Position, ')')
            FROM    SpecialCharacters AS s
            WHERE   s.ID = t.ID
            ORDER BY Position
            FOR XML PATH(''), TYPE
        ) s (SpecialCharacterList)
ORDER BY ID;

Result 结果

ID  Email                   SpecialCharacters
------------------------------------------------
1   JóhnSnów@gmail.com      ó(2), ó(7)
2   Khãlessi@gmail.com      ã(3)
3   NedStark@gmail.com  

As an aside, it may be better suited to your needs to store in a table what you count as special characters rather than relying on the code pages for specific collations, if you were to do this, you would just need to change this line: 顺便说一句,它可能更适合于您在表中存储算作特殊字符的需求,而不是依赖于代码页进行特定的校对,如果您要这样做,则只需要更改以下行即可:

WHERE   CONVERT(CHAR(1), ac.Character) COLLATE SQL_Latin1_General_Cp1251_CS_AS <> ac.Character

For: 对于:

WHERE EXISTS (SELECT 1 FROM MySpecialCharacterTable AS sct WHERE sct.Character = ac.Character)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM