I need to find a continuous date in a string from column name Filename. The string has other numbers in it with dashes(or another character, like an underscore), but I only need the continuous number
The Date needs to be extracted from the filename. (I know the data is just wow, multiple vendors, multiple file naming formats is the cause.)
This question is similar to this question, but it's looking for something different with a different requirement: TSQL: Find a continuous number in a string
Desired result:
Actual Result:
Test Code:
DROP TABLE #dob
CREATE TABLE #dob (
FILENAME VARCHAR(MAX)
,StudentID INT
)
INSERT INTO #dob
( FILENAME )
VALUES
('Smith John D, 11-23-1980, 1234567.pdf')
,('Doe Jane, _01_22_1980_123456.pdf')
,('John Doe, 567891.pdf' )
--This is what I tried.
SELECT FILENAME
, substring(FileName, patindex('%[0-9][%-%][%_%][0-9][0-9][0-9][0-9][0-9]%', FileName), 8) AS dob
FROM #dob
Try it like this:
DROP TABLE #StuID
GO
CREATE TABLE #StuID (
FILENAME VARCHAR(MAX)
,StudentID INT
)
INSERT INTO #StuID
( FILENAME )
VALUES
('Smith John D, 11-23-1980, 1234567.pdf')
,('Doe Jane, _01_22_1980_123456.pdf')
,('John Doe, 567891.pdf' );
WITH Casted([FileName],ToXml) AS
(
SELECT [FILENAME]
,CAST('<x>' + REPLACE(REPLACE(REPLACE(REPLACE(REPLACE([FILENAME],'.',' '),'-',' '),'_',' '),',',' '),' ','</x><x>') + '</x>' AS XML)
FROM #StuID
)
SELECT [FileName]
,numberFragments.value('/x[.>=1 and .<=31][1]','int') AS MonthFragment --using <=12 might bring back the second fragment twice...
,numberFragments.value('/x[.>=1 and .<=31][2]','int') AS DayFragment
,numberFragments.value('/x[.>=1960 and .<=2050][1]','int') AS YearFragment
,numberFragments.value('/x[.>=100000 and .<=10000000][1]','int') AS StudId
FROM Casted
CROSS APPLY (SELECT ToXml.query('/x[not(empty(. cast as xs:int?))]')) A(numberFragments);
The idea in short:
As in the previous answer we will break the string to a XML and filter for fragments castable to int
.
The magic ist the XQuery-filtering:
Hint: It looks like a nice idea to use <=12
for the month fragment, but I'd use the same filter for day and month to make sure, that we pick the first and the second fragment of the same value region...
I don't think you have the pattern quite right. Also, you can use a CASE
expression to return NULL
:
SELECT FILENAME,
(CASE WHEN FileName LIKE '%[0-9][0-9][-_][0-9][0-9][-_][0-9][0-9][0-9][0-9]%'
THEN substring(FileName, patindex('%[0-9][0-9][-_][0-9][0-9][-_][0-9][0-9][0-9][0-9]%', FileName), 10)
END) AS dob
FROM #dob;
You can also dispense with the CASE
and use NULLIF()
:
substring(FileName, NULLIF(patindex('%[0-9][0-9][-_][0-9][0-9][-_][0-9][0-9][0-9][0-9]%', FileName), 0), 10) as dob
Another method would be (after using PATINDEX
to find the date) is force to string's format to MM/dd/yyyy
and then use an explicit style for the conversion:
SELECT *,
TRY_CONVERT(date,STUFF(STUFF(SUBSTRING(d.FILENAME,V.I, 10),3,1,'/'),6,1,'/'),101)
FROM #dob d
CROSS APPLY (VALUES(NULLIF(PATINDEX('%[0-9][0-9]_[0-9][0-9]_[0-9][0-9][0-9][0-9]%',d.[FILENAME]),0))) V(I);
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.