简体   繁体   中英

TSQL: Find a date with varying characters in a string

I need to find a continuous date in a string from column name Filename. The string has other numbers in it with dashes(or another character, like an underscore), but I only need the continuous number

The Date needs to be extracted from the filename. (I know the data is just wow, multiple vendors, multiple file naming formats is the cause.)

This question is similar to this question, but it's looking for something different with a different requirement: TSQL: Find a continuous number in a string

Desired result:

在此处输入图片说明

Actual Result:

在此处输入图片说明

Test Code:

DROP TABLE #dob

CREATE TABLE #dob (
 FILENAME VARCHAR(MAX)
,StudentID INT
)

INSERT INTO #dob
( FILENAME  )
VALUES
 ('Smith John D, 11-23-1980, 1234567.pdf')
,('Doe Jane, _01_22_1980_123456.pdf')
,('John Doe, 567891.pdf' )

--This is what I tried.

SELECT FILENAME
, substring(FileName, patindex('%[0-9][%-%][%_%][0-9][0-9][0-9][0-9][0-9]%', FileName), 8) AS dob
FROM #dob

Try it like this:

DROP TABLE #StuID
GO
CREATE TABLE #StuID (
 FILENAME VARCHAR(MAX)
,StudentID INT
)

INSERT INTO #StuID
( FILENAME  )
VALUES
 ('Smith John D, 11-23-1980, 1234567.pdf')
,('Doe Jane, _01_22_1980_123456.pdf')
,('John Doe, 567891.pdf' );

WITH Casted([FileName],ToXml) AS
(
    SELECT [FILENAME] 
          ,CAST('<x>' + REPLACE(REPLACE(REPLACE(REPLACE(REPLACE([FILENAME],'.',' '),'-',' '),'_',' '),',',' '),' ','</x><x>') + '</x>' AS XML)
    FROM #StuID
)
SELECT [FileName] 
      ,numberFragments.value('/x[.>=1 and .<=31][1]','int') AS MonthFragment --using <=12 might bring back the second fragment twice...
      ,numberFragments.value('/x[.>=1 and .<=31][2]','int') AS DayFragment
      ,numberFragments.value('/x[.>=1960 and .<=2050][1]','int') AS YearFragment
      ,numberFragments.value('/x[.>=100000 and .<=10000000][1]','int') AS StudId
FROM Casted
CROSS APPLY (SELECT ToXml.query('/x[not(empty(. cast as xs:int?))]')) A(numberFragments);

The idea in short:

As in the previous answer we will break the string to a XML and filter for fragments castable to int .
The magic ist the XQuery-filtering:

  • We pick the first fragment between 1 and 31, which is the month hopefully
  • We pick the second fragment between 1 and 31 which is the day hopefully
  • We pick the first fragment between 1960 and 2050 which is the year hopefully
  • And we pick the student's id, which is the first fragment between 100000 and 10000000.

Hint: It looks like a nice idea to use <=12 for the month fragment, but I'd use the same filter for day and month to make sure, that we pick the first and the second fragment of the same value region...

I don't think you have the pattern quite right. Also, you can use a CASE expression to return NULL :

SELECT FILENAME,
       (CASE WHEN FileName LIKE '%[0-9][0-9][-_][0-9][0-9][-_][0-9][0-9][0-9][0-9]%'
             THEN substring(FileName, patindex('%[0-9][0-9][-_][0-9][0-9][-_][0-9][0-9][0-9][0-9]%', FileName), 10)
        END) AS dob
FROM #dob;

You can also dispense with the CASE and use NULLIF() :

   substring(FileName, NULLIF(patindex('%[0-9][0-9][-_][0-9][0-9][-_][0-9][0-9][0-9][0-9]%', FileName), 0), 10) as dob

Another method would be (after using PATINDEX to find the date) is force to string's format to MM/dd/yyyy and then use an explicit style for the conversion:

SELECT *,
       TRY_CONVERT(date,STUFF(STUFF(SUBSTRING(d.FILENAME,V.I, 10),3,1,'/'),6,1,'/'),101)
FROM #dob d
     CROSS APPLY (VALUES(NULLIF(PATINDEX('%[0-9][0-9]_[0-9][0-9]_[0-9][0-9][0-9][0-9]%',d.[FILENAME]),0))) V(I);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM