简体   繁体   中英

Finding The Max Date in a XML column in SQL Server

I have a XML column called RecentlyViewedXml in a table that is structured like:

<RecentlyViewedEntityData etc="2">
    <RecentlyViewedItem>
        <Type></Type>
        <DisplayName>Contact</DisplayName>
        <Title>My Book of Business 1</Title>        
        <LastAccessed>1/1/2010</LastAccessed>
    </RecentlyViewedItem>
    <RecentlyViewedItem>
        <Type></Type>
        <DisplayName>Contact</DisplayName>
        <Title>My Book of Business</Title>      
        <LastAccessed>1/5/2010</LastAccessed>
    </RecentlyViewedItem>
    <RecentlyViewedItem>
        <Type></Type>
        <DisplayName>Contact</DisplayName>
        <Title>My Book of Business</Title>      
        <LastAccessed>1/3/2010</LastAccessed>
    </RecentlyViewedItem>
</RecentlyViewedEntityData>

I am trying to get the max date from the LastAccessed element (and ideally the rest of the node items that correspond to the data in that node.

I tried several options but my primary issue is that I don't know if the [Last] node always has the max date. I was using this but it failed the QA

Cast(RecentlyViewedXml as xml).query('data(/RecentlyViewedEntityData/RecentlyViewedItem[last()]/LastAccessed[last()])')

I would be open to any ideas.

Thank you!

First of all: It is very dangerous to use culture dependant date formats. Try this:

SET LANGUAGE GERMAN;
SELECT CAST('1/3/2010' AS DATE);

SET LANGUAGE ENGLISH;
SELECT CAST('1/3/2010' AS DATE);

You can use CONVERT() together with the third parameter (in your case assumably 103 ) to avoid this, but any implicit cast will use your system's settings.

Another point is data within XML . Your dates like '1/1/2010' are not properly serialized. This should be ISO8601 .

Some background: As you surely know, many datatypes are not displayed the way they are stored. A number like 3 is a binary pattern actually. Whenever you want a human to read this, it must be translated to a string representation. Whenever data must be embedded into string based containers, they must be serialized. As long as the same rules are applied on serializing and de-serializing this works fine. But the reading side must rely on appropriate values.

Try this:

DECLARE @Xml XML='N<root>
                    <data>
                      <SomeInt>1</SomeInt>
                      <SomeDate>2017-01-01</SomeDate>
                      <BadDate>1/3/2010</BadDate>
                    </data>
                    <data>
                      <SomeInt>5</SomeInt>
                      <SomeDate>2017-01-05</SomeDate>
                      <BadDate>4/3/2010</BadDate>
                    </data>
                    <data>
                      <SomeInt>3</SomeInt>
                      <SomeDate>2017-01-03</SomeDate>
                      <BadDate>5/1/2010</BadDate>
                    </data>
                   </root>';

A simple XQuery function max() will return the highest int value

SELECT @xml.value(N'max(//SomeInt)','int') MaxInt;

But this does not work for a (correct ISO8601) date (even though the Remarks-section of the function's docu sounds different):

SELECT @xml.value(N'max(//SomeDate)','date') MaxDate; --returns NULL

You can use an embedded FLWOR query to previously cast all values one by one:

SELECT @xml.value(N'max(for $d in //SomeDate return $d cast as xs:date?)','date') MaxDate;

But this does not work for your non-ISO8601 dates:

SELECT @xml.value(N'max(for $d in //BadDate return $d cast as xs:date?)','date') MaxDate;

Back to your issue

You can get a derived table reading your values uncasted (similar to a staging table on import of data) and use T-SQL's abilities to handle this:

DECLARE @mockup TABLE  (Id INT IDENTITY , YourXml XML)

INSERT INTO @mockup VALUES 
(N'<RecentlyViewedEntityData etc="2">
    <RecentlyViewedItem>
        <Type></Type>
        <DisplayName>Contact</DisplayName>
        <Title>My Book of Business 1</Title>        
        <LastAccessed>1/1/2010</LastAccessed>
    </RecentlyViewedItem>
    <RecentlyViewedItem>
        <Type></Type>
        <DisplayName>Contact</DisplayName>
        <Title>My Book of Business</Title>      
        <LastAccessed>1/5/2010</LastAccessed>
    </RecentlyViewedItem>
    <RecentlyViewedItem>
        <Type></Type>
        <DisplayName>Contact</DisplayName>
        <Title>My Book of Business</Title>      
        <LastAccessed>1/3/2010</LastAccessed>
    </RecentlyViewedItem>
</RecentlyViewedEntityData>');

--Retrieve the derived table as CTE:

WITH DerivedTable AS
(
    SELECT itm.value(N'(Type/text())[1]',N'nvarchar(max)') AS [Type]
          ,itm.value(N'(DisplayName/text())[1]',N'nvarchar(max)') AS [DisplayName]
          ,itm.value(N'(Title/text())[1]',N'nvarchar(max)') AS [Title]
          ,CONVERT(DATE,itm.value(N'(LastAccessed/text())[1]',N'nvarchar(max)'),103) AS [LastAccessed]
    FROM @mockup AS m
    OUTER APPLY m.YourXml.nodes(N'/RecentlyViewedEntityData/RecentlyViewedItem') AS A(itm)
)
SELECT TOP 1 * 
FROM DerivedTable
ORDER BY LastAccessed DESC;

After a correct cast (convert) to the native DATE type you can use ORDER BY in connection with TOP 1 to get the max value.

UPDATE: solution for a table with many rows

Your comment is correct, but this can be done easier than a self-join:

DECLARE @mockup TABLE  (Id INT IDENTITY , YourXml XML)

INSERT INTO @mockup VALUES 
(N'<RecentlyViewedEntityData etc="2">
    <RecentlyViewedItem>
        <Type></Type>
        <DisplayName>Contact</DisplayName>
        <Title>My Book of Business 1</Title>        
        <LastAccessed>1/1/2010</LastAccessed>
    </RecentlyViewedItem>
    <RecentlyViewedItem>
        <Type></Type>
        <DisplayName>Contact</DisplayName>
        <Title>My Book of Business</Title>      
        <LastAccessed>1/5/2010</LastAccessed>
    </RecentlyViewedItem>
    <RecentlyViewedItem>
        <Type></Type>
        <DisplayName>Contact</DisplayName>
        <Title>My Book of Business</Title>      
        <LastAccessed>1/3/2010</LastAccessed>
    </RecentlyViewedItem>
</RecentlyViewedEntityData>')
,(N'<RecentlyViewedEntityData etc="2">
    <RecentlyViewedItem>
        <Type></Type>
        <DisplayName>Contact</DisplayName>
        <Title>My Book of Business 1</Title>        
        <LastAccessed>1/1/2010</LastAccessed>
    </RecentlyViewedItem>
    <RecentlyViewedItem>
        <Type></Type>
        <DisplayName>Contact</DisplayName>
        <Title>My Book of Business</Title>      
        <LastAccessed>1/5/2010</LastAccessed>
    </RecentlyViewedItem>
    <RecentlyViewedItem>
        <Type></Type>
        <DisplayName>Contact</DisplayName>
        <Title>My Book of Business</Title>      
        <LastAccessed>1/3/2010</LastAccessed>
    </RecentlyViewedItem>
</RecentlyViewedEntityData>');

--Retrieve the derived table as CTE, include the row's id:

WITH DerivedTable AS
(
    SELECT Id
          ,itm.value(N'(Type/text())[1]',N'nvarchar(max)') AS [Type]
          ,itm.value(N'(DisplayName/text())[1]',N'nvarchar(max)') AS [DisplayName]
          ,itm.value(N'(Title/text())[1]',N'nvarchar(max)') AS [Title]
          ,CONVERT(DATE,itm.value(N'(LastAccessed/text())[1]',N'nvarchar(max)'),103) AS [LastAccessed]
    FROM @mockup AS m
    OUTER APPLY m.YourXml.nodes(N'/RecentlyViewedEntityData/RecentlyViewedItem') AS A(itm)
)
SELECT TOP 1 WITH TIES * 
FROM DerivedTable
ORDER BY ROW_NUMBER() OVER(PARTITION BY ID ORDER BY LastAccessed DESC);

The ORDER BY will call ROW_NUMBER() with an OVER() clause. This will add a rank to the dates, partitioned by the row's id. TOP 1 WITH TIES will return all rows with a 1 from ROW_NUMBER . This will be the top-most result for each single table row.

This was my first idea:

declare @xdoc xml = '<RecentlyViewedEntityData etc="2">
<RecentlyViewedItem>
    <Type></Type>
    <DisplayName>Contact</DisplayName>
    <Title>My Book of Business 1</Title>        
    <LastAccessed>1/1/2010</LastAccessed>
</RecentlyViewedItem>
<RecentlyViewedItem>
    <Type></Type>
    <DisplayName>Contact</DisplayName>
    <Title>My Book of Business</Title>      
    <LastAccessed>1/5/2010</LastAccessed>
</RecentlyViewedItem>
<RecentlyViewedItem>
    <Type></Type>
    <DisplayName>Contact</DisplayName>
    <Title>My Book of Business</Title>      
    <LastAccessed>1/3/2010</LastAccessed>
</RecentlyViewedItem>
</RecentlyViewedEntityData>';

DECLARE @xhandle INT

EXEC sp_xml_preparedocument @xhandle OUTPUT, @xdoc

;WITH tmp 
AS
(
SELECT  *
FROM    OPENXML(@xhandle, '//RecentlyViewedItem', 2)  
    WITH (
    DisplayName NVARCHAR(50)  'DisplayName',
    Title NVARCHAR(50) 'Title',
    LastAccessed DATE 'LastAccessed'
    )  
)
SELECT * FROM tmp WHERE tmp.LastAccessed = (SELECT MAX(LastAccessed) FROM tmp)

EXEC sp_xml_removedocument @xhandle

Perhaps this?

DECLARE @XML xml = '
<RecentlyViewedEntityData etc="2">
    <RecentlyViewedItem>
        <Type></Type>
        <DisplayName>Contact</DisplayName>
        <Title>My Book of Business 1</Title>        
        <LastAccessed>1/1/2010</LastAccessed>
    </RecentlyViewedItem>
    <RecentlyViewedItem>
        <Type></Type>
        <DisplayName>Contact</DisplayName>
        <Title>My Book of Business</Title>      
        <LastAccessed>1/5/2010</LastAccessed>
    </RecentlyViewedItem>
    <RecentlyViewedItem>
        <Type></Type>
        <DisplayName>Contact</DisplayName>
        <Title>My Book of Business</Title>      
        <LastAccessed>1/3/2010</LastAccessed>
    </RecentlyViewedItem>
</RecentlyViewedEntityData>';

WITH XMLData AS (
    SELECT  X.RR.value('(Type/text())[1]','int') AS [Type], --Guessed the datatype here
            X.RR.value('(DisplayName/text())[1]','varchar(50)') AS DisplayName,
            X.RR.value('(Title/text())[1]','varchar(50)') AS Title,
            X.RR.value('(LastAccessed/text())[1]','date') AS LastAccessed
    FROM @xml.nodes('RecentlyViewedEntityData/RecentlyViewedItem') X(RR)),
RNs AS(
    SELECT *,
           ROW_NUMBER() OVER (ORDER BY LastAccessed DESC) AS RN
    FROM XMLData)
SELECT [Type],
       DisplayName,
       Title,
       LastAccessed
FROM RNs
WHERE RN = 1;

using Xquery will ease on your parsing

    DECLARE @TB TABLE  (Id int , XCol XML)

    INSERT INTO @TB 
     VALUES (1, '<RecentlyViewedEntityData etc="2">
        <RecentlyViewedItem>
            <Type></Type>
            <DisplayName>Contact</DisplayName>
            <Title>My Book of Business 1</Title>        
            <LastAccessed>1/1/2010</LastAccessed>
        </RecentlyViewedItem>
        <RecentlyViewedItem>
            <Type></Type>
            <DisplayName>Contact</DisplayName>
            <Title>My Book of Business</Title>      
            <LastAccessed>1/5/2010</LastAccessed>
        </RecentlyViewedItem>
        <RecentlyViewedItem>
            <Type></Type>
            <DisplayName>Contact</DisplayName>
            <Title>My Book of Business</Title>      
            <LastAccessed>1/3/2010</LastAccessed>
        </RecentlyViewedItem>
    </RecentlyViewedEntityData>')


    SELECT MAX(LastAccessed) LastAccessed , DisplayName , Title
    FROM
    ( 
    SELECT TRY_CONVERT(DATE,x.c.value('./LastAccessed[1]', 'varchar(100)')) LastAccessed,
           x.c.value('./DisplayName[1]', 'varchar(100)') DisplayName,
           x.c.value('./Title[1]', 'varchar(100)') Title
    FROM @tb 
        CROSS APPLY Xcol.nodes ('/RecentlyViewedEntityData/RecentlyViewedItem') x(c)
    ) P
    GROUP BY DisplayName , Title

NOTE if your structure is as described then this query will produce the result you need.

NOTE2 I used TRY_CONVERT which is available since SQL 2012, if your version is older eg 2008 and down you'll need to use a regular convert or cast but you'll need to use a case statement to handle the failure of converting the string value of the date to a date data type.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM