简体   繁体   中英

Extract names from varchar column in SQL Server

It is a simple test for me to do in SQL, I cant find any solution to do this.

I have a below table name summary , user

+----+--------------------------------------------------------------------------------------------+
| id |                                          summary                                           |
+----+--------------------------------------------------------------------------------------------+
|  1 | asdffgggggg Anand   * edkkofffmfmmfmfm Bala          sdkdodkekeke Chandra dkkdkd "vinoth"* |
|  2 | asdffgggggg Dinesh  * edkkofffmfmmfmfm Frankin       sdkdodkekeke Elisia  dkkdkd  Ganesh.  |
|  3 | asdffgggggg Hansika  edkkofffmfmmfmfm [A.Ishwariya]* sdkdodkekeke Jack    dkkdkd "Lalitha" |
+----+--------------------------------------------------------------------------------------------+

+----+-------------+
| id |    name     |
+----+-------------+
|  1 | A.Ishwariya |
|  2 | Anand       |
|  3 | Bala        |
|  4 | Chandra     |
|  5 | Dinesh      |
|  6 | Elisia      |
|  7 | Frankin     |
|  8 | Ganesh      |
|  9 | Hansika     |
| 10 | Jack        |
| 11 | Lalitha     |
| 12 | Vinoth      |
+----+-------------+
  • I want to get all the names from the summary column ends with *

Output 1:

╔════╦═════════════╗
║ id ║    name     ║
╠════╬═════════════╣
║  1 ║ Anand       ║
║  1 ║ Vinoth      ║
║  2 ║ Dinesh      ║
║  3 ║ A.Ishwariya ║
╚════╩═════════════╝
  • I want to get all the names from the summary column ends without *

Output 2:

╔════╦═════════╗
║ id ║  name   ║
╠════╬═════════╣
║  1 ║ Bala    ║
║  1 ║ Chandra ║
║  2 ║ Frankin ║
║  2 ║ Elisia  ║
║  2 ║ Ganesh  ║
║  3 ║ Hansika ║
║  3 ║ Jack    ║
║  3 ║ Lalitha ║
╚════╩═════════╝

Any help will be much appreciated.

Despiste some lack of details I manged to create a answer. If we guess the data is in this format

Create table dbo.[Summary]
(
   id int not null
  ,summary varchar(2000) not null
)
GO

insert into dbo.[Summary] 
values
  (1, 'asdffgggggg Anand   * edkkofffmfmmfmfm Bala          sdkdodkekeke Chandra dkkdkd "vinoth"*')
 ,(2, 'asdffgggggg Dinesh  * edkkofffmfmmfmfm Frankin       sdkdodkekeke Elisia  dkkdkd  Ganesh')
 ,(3, 'asdffgggggg Hansika  edkkofffmfmmfmfm [A.Ishwariya]* sdkdodkekeke Jack    dkkdkd "Lalitha"')
 GO

We first need to clean that messy data this way:

update s
set s.summary = replace(s.summary,'[','')
 from dbo.[Summary] s

update s
set s.summary = replace(s.summary,']','')
 from dbo.[Summary] s

update s
set s.summary = replace(s.summary,'"','')
 from dbo.[Summary] s

 
while exists(
 select *
 from dbo.[Summary] s
 where charindex('  ',s.summary) > 0
)
begin
    update s
    set s.summary = replace(s.summary, '  ',' ')
    from dbo.[Summary] s
end

update s
set s.summary = replace(s.summary, ' *','*')
from dbo.[Summary] s

Now we get rid of extra spaces and special chars. We need to count the spaces.

Obs: I'm guessing the "structure" fo the data is invariant. Off course is possible to process variant structures (variant number of names in each row for example) but that's complicated and can needs recursion, loops etc.

declare @Spaces as Table
(
   SummaryId int not null
  ,Space01 int not null
  ,Space02 int null
  ,Space03 int null
  ,Space04 int null
  ,Space05 int null
  ,Space06 int null
  ,Space07 int null
  ,Space08 int null
)

insert into @Spaces
(SummaryId, Space01)
select s.id, charindex(' ',s.summary)
from dbo.[Summary] s

update sp set sp.Space02 = charindex(' ', s.summary, sp.Space01 +1) from @Spaces sp join dbo.[Summary] s on s.id = sp.SummaryId
update sp set sp.Space03 = charindex(' ', s.summary, sp.Space02 +1) from @Spaces sp join dbo.[Summary] s on s.id = sp.SummaryId
update sp set sp.Space04 = charindex(' ', s.summary, sp.Space03 +1) from @Spaces sp join dbo.[Summary] s on s.id = sp.SummaryId
update sp set sp.Space05 = charindex(' ', s.summary, sp.Space04 +1) from @Spaces sp join dbo.[Summary] s on s.id = sp.SummaryId
update sp set sp.Space06 = charindex(' ', s.summary, sp.Space05 +1) from @Spaces sp join dbo.[Summary] s on s.id = sp.SummaryId
update sp set sp.Space07 = charindex(' ', s.summary, sp.Space06 +1) from @Spaces sp join dbo.[Summary] s on s.id = sp.SummaryId
update sp set sp.Space08 = len(s.summary)+1 from @Spaces sp join dbo.[Summary] s on s.id = sp.SummaryId

--select * from @Spaces

declare @Names as Table
(
   SummaryId int not null
  ,Name varchar(200) not null
)


insert into @Names select s.id, SUBSTRING(s.summary, sp.Space01, sp.Space02 - sp.Space01) from @Spaces sp join dbo.[Summary] s on s.id = sp.SummaryId
insert into @Names select s.id, SUBSTRING(s.summary, sp.Space03, sp.Space04 - sp.Space03) from @Spaces sp join dbo.[Summary] s on s.id = sp.SummaryId
insert into @Names select s.id, SUBSTRING(s.summary, sp.Space05, sp.Space06 - sp.Space05) from @Spaces sp join dbo.[Summary] s on s.id = sp.SummaryId
insert into @Names select s.id, SUBSTRING(s.summary, sp.Space07, sp.Space08 - sp.Space07) from @Spaces sp join dbo.[Summary] s on s.id = sp.SummaryId

--select * from @Names
select n.SummaryId, replace(n.Name, '*','') as Name from @Names n where charindex('*',n.Name) > 0
select n.SummaryId, n.Name from @Names n where charindex('*',n.Name) = 0

Finally we can find all that spaces separating names and use it to extract names (and surnames?)

This produce the desired output

Edit

I build this solution before OP posting the USER table. Here I'm just handling bad formated data and playing with strings.

Using that USER table can do things a lot easier. Just pick each name and search it in the summary.

Here you go. SQL Fiddle here: http://sqlfiddle.com/#!3/c4b9e/1 :

CREATE FUNCTION [dbo].[SplitString]
(
   @CSVString NVARCHAR(MAX),
   @Delimiter NVARCHAR(255)
)
RETURNS TABLE
WITH SCHEMABINDING AS
RETURN
   WITH E1(N)        AS ( SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 
                     UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 
                     UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1),
   E2(N)        AS (SELECT 1 FROM E1 a, E1 b),
   E4(N)        AS (SELECT 1 FROM E2 a, E2 b),
   E42(N)       AS (SELECT 1 FROM E4 a, E4 b),
   cteTally(N)  AS (SELECT 0 
                    UNION ALL 
                    SELECT TOP (DATALENGTH(ISNULL(@CSVString,1))) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E42),
   cteStart(N1) AS (SELECT t.N+1 
                    FROM cteTally t
                     WHERE (SUBSTRING(@CSVString,t.N,1) = @Delimiter OR t.N = 0))
    SELECT Item = SUBSTRING(@CSVString, s.N1, ISNULL(NULLIF(CHARINDEX(@Delimiter,@CSVString,s.N1),0)-s.N1,8000))
FROM cteStart s;

CREATE TABLE Summary (Id INT, Summary NVARCHAR(4000))
CREATE TABLE [User] (NAME NVARCHAR(50))

INSERT INTO Summary (Id,Summary)
SELECT 1,'asdffgggggg Anand   * edkkofffmfmmfmfm Bala          sdkdodkekeke Chandra dkkdkd "vinoth"*'
UNION ALL
SELECT 2,'asdffgggggg Dinesh  * edkkofffmfmmfmfm Frankin       sdkdodkekeke Elisia  dkkdkd  Ganesh.'
UNION ALL
SELECT 2,'asdffgggggg Hansika  edkkofffmfmmfmfm [A.Ishwariya]* sdkdodkekeke Jack    dkkdkd "Lalitha"'

INSERT INTO [User] (Name)
SELECT 'Anand'
UNION ALL
SELECT 'Vinoth'
UNION ALL
SELECT 'Dinesh'
UNION ALL
SELECT 'A.Ishwariya'

This is your query to return names from summary column that have a *:

SELECT 
Data.Id,Summ
FROM 
(SELECT Id,Item
,REPLACE(REPLACE(REPLACE(SUBSTRING(RTRIM(item),LEN(RTRIM(item)) + 2 - CHARINDEX(' ',REVERSE(RTRIM(item))),4000),'"',''),'[',''),']','') Summ
FROM Summary 
CROSS APPLY (SELECT item FROM SplitString(Summary,'*')) Map) Data
INNER JOIN [User] ON Data.Summ = Name

As a side note, if you can store the data in your summary column in multiple columns instead of one column, this would have been simpler.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM