简体   繁体   中英

Parsing T-SQL To Extract Part of WHERE Clause

I have a large SQL database containing 'curves'. Each curve has an ID (curveid). I'm trying to determine main users of each curve and if it is used at all. To enable this, the DBAs are providing logs of all statements executed against the database.

These statements can very in complexity. All I want to do is extract what curveids are being queried for.

Example statement follows:

WITH G AS ( SELECT [Timevalue] FROM [mc].[GranularityLookup] 
WHERE [TimeValue] BETWEEN '19-Jul-2017 00:00' AND '30-Sep-2017 00:00' 
AND [1 Hr] = 1), 
D AS ( SELECT [CurveID], [DeliveryDate], [PublishDate], AVG([Value]) Value, MAX([PeriodNumber]) PeriodNumber 
FROM mc.CURVEID_6657_1_LATEST data 
JOIN 
(SELECT CurveID ID, DeliveryDate dDate, MAX(PublishDate) pDate 
FROM mc.CURVEID_6657_1_LATEST
WHERE CurveID = 90564
    AND DeliveryDate >= '19-Jul-2017 00:00' AND DeliveryDate <= '30-Sep-2017 00:00'
GROUP BY DeliveryDate,  CurveID ) Dates 
ON data.DeliveryDate = dates.dDate AND data.PublishDate = dates.pDate 
WHERE data.CurveID = 90564
AND data.DeliveryDate >= '19-Jul-2017 00:00' AND data.DeliveryDate <= '30-Sep-2017 00:00'
GROUP BY [CurveID], [PublishDate], [DeliveryDate] )
SELECT 
G.[TimeValue] [Deliver
yDate] , D.[PublishDate], D.[Value], D.[PeriodNumber]
FROM 
G
LEFT JOIN 
D
ON 
G.[TimeValue] = D.[DeliveryDate]
ORDER BY DeliveryDate ASC, PeriodNumber ASC, publishDate DESC

From this statement, all I'm interested in is extracting that the user queried for curveid 90564.

The statement may also resemble either of the following:

SELECT * FROM anytable WHERE curveid = 123 AND deliverydate BETWEEN '2017-01-01' AND 2017-02-01'

or

SELECT * FROM mc.anytable WHERE curveid IN (1,2,3,4,5,6,7)

Again, all I want to know is the curve IDs. I don't care about any of the other clauses.

I'm using the Microsoft.SqlServer.TransactSql.ScriptDom namespace to parse the SQL and have got to the point where I can identify all WHERE statements using code similar to below (pieced together from some other samples):

string sql = @"WITH 
            G AS ( SELECT [Timevalue] FROM [mc].[GranularityLookup] 
            WHERE [TimeValue] BETWEEN '19-Jul-2017 00:00' AND '30-Sep-2017 00:00' 
            AND [1 Hr] = 1), 
            D AS ( SELECT [CurveID], [DeliveryDate], [PublishDate], AVG([Value]) Value, MAX([PeriodNumber]) PeriodNumber 
            FROM mc.CURVEID_6657_1_LATEST data 
            JOIN 
            (SELECT CurveID ID, DeliveryDate dDate, MAX(PublishDate) pDate 
            FROM mc.CURVEID_6657_1_LATEST
            WHERE CurveID = 90564
                AND DeliveryDate >= '19-Jul-2017 00:00' AND DeliveryDate <= '30-Sep-2017 00:00'
            GROUP BY DeliveryDate,  CurveID ) Dates 
            ON data.DeliveryDate = dates.dDate AND data.PublishDate = dates.pDate 
            WHERE data.CurveID = 90564
            AND data.DeliveryDate >= '19-Jul-2017 00:00' AND data.DeliveryDate <= '30-Sep-2017 00:00'
            GROUP BY [CurveID], [PublishDate], [DeliveryDate] )
            SELECT 
            G.[TimeValue] [Deliver
            yDate] , D.[PublishDate], D.[Value], D.[PeriodNumber]
            FROM 
            G
            LEFT JOIN 
            D
            ON 
            G.[TimeValue] = D.[DeliveryDate]
            ORDER BY DeliveryDate ASC, PeriodNumber ASC, publishDate DESC";
            var parser = new TSql120Parser(false);

            IList<ParseError> errors;
            var fragment = parser.Parse(new StringReader(sql), out errors);

            var whereVisitor = new WhereVisitor();
            fragment.Accept(whereVisitor);

            //  I now have all WHERE clauses in whereVisitor.WhereStatements

class WhereVisitor : TSqlConcreteFragmentVisitor
{
    public readonly List<WhereClause> WhereStatements = new List<WhereClause>();

    public override void Visit(WhereClause node)
    {
        WhereStatements.Add(node);
    }

}

Each of the clauses in whereVisitor.WhereStatements (3 in this example) expose a property called SearchCondition. Unfortunately, this is where I run out of ideas. What I want to achieve is logic as per below:

foreach (var clause in whereVisitor.WhereStatements)
{
    //  IF any part of the clause filters based on curveid THEN

    //        Capture curveIDs

    //  END IF
}

Other details:

  • Using C# (.net 4.0)
  • SQL Server 2008
  • DLL for this is Microsoft.SqlServer.TransactSql.ScriptDom (located in my case at 'c:\\Program Files (x86)\\Microsoft SQL Server\\130\\Tools\\PowerShell\\Modules\\SQLPS\\Microsoft.SqlServer.TransactSql.ScriptDom.dll')

Edit 1

Some additional info:

  • CurveID is a key to another table. It would not make sense in this case to operate on it (eg curveId+1 or curveId <= 10).

Edit 2 (Partial Solution)

Having the following visitor helps with the case where clause resembles curveid = 123:

class CurveIdVisitor : TSqlConcreteFragmentVisitor
{
    public readonly List<int> CurveIds = new List<int>();

    public override void Visit(BooleanComparisonExpression exp)
    {
        if (exp.FirstExpression is ColumnReferenceExpression && exp.SecondExpression is IntegerLiteral )
        {
            //  there is a possibility that this is of the ilk 'curveid = 123'
            //  we will look for the 'identifier'
            //  we take the last if there are multiple.  Example:
            //      alias.curveid
            //  goives two identifiers: alias and curveid
            if (
                ((ColumnReferenceExpression) exp.FirstExpression).MultiPartIdentifier.Identifiers.Last().Value.ToLower() ==
                "curveid")
            {
                //  this is definitely a curveid filter
                //  Now to find the curve id
                int curveid = int.Parse(((IntegerLiteral) exp.SecondExpression).Value);
                CurveIds.Add(curveid);
            }
        }

Finally solved this and hope this benefits someone else in the future. Perhaps someone else may read ni time and provide a better solution.

public class SqlParser
{
    public List<int> GetQueriedCurveIds(string sql)
    {
        var parser = new TSql120Parser(false);

        IList<ParseError> errors;
        var fragment = parser.Parse(new StringReader(sql), out errors);

        List<int> curveIds = new List<int>();
        CurveIdVisitor cidv = new CurveIdVisitor();
        InPredicateVisitor inpv = new InPredicateVisitor();
        fragment.AcceptChildren(cidv);
        fragment.AcceptChildren(inpv);

        curveIds.AddRange(cidv.CurveIds);
        curveIds.AddRange(inpv.CurveIds);
        return curveIds.Distinct().ToList();
    }
}



class CurveIdVisitor : TSqlConcreteFragmentVisitor
{
    public readonly List<int> CurveIds = new List<int>();

    public override void Visit(BooleanComparisonExpression exp)
    {
        if (exp.FirstExpression is ColumnReferenceExpression && exp.SecondExpression is IntegerLiteral )
        {
            //  there is a possibility that this is of the ilk 'curveid = 123'
            //  we will look for the 'identifier'
            //  we take the last if there are multiple.  Example:
            //      alias.curveid
            //  goives two identifiers: alias and curveid
            if (
                ((ColumnReferenceExpression) exp.FirstExpression).MultiPartIdentifier.Identifiers.Last().Value.ToLower() ==
                "curveid")
            {
                //  this is definitely a curveid filter
                //  Now to find the curve id
                int curveid = int.Parse(((IntegerLiteral) exp.SecondExpression).Value);
                CurveIds.Add(curveid);
            }
        }
    }
}

class InPredicateVisitor : TSqlConcreteFragmentVisitor
{
    public readonly List<int> CurveIds = new List<int>();

    public override void Visit(InPredicate exp)
    {
        if (exp.Expression is ColumnReferenceExpression)
        {
            if (
                ((ColumnReferenceExpression) exp.Expression).MultiPartIdentifier.Identifiers.Last().Value.ToLower() ==
                "curveid")
            {
                foreach (var value in exp.Values)
                {
                    if (value is IntegerLiteral)
                    {
                        CurveIds.Add(int.Parse(((IntegerLiteral)value).Value));
                    }
                }
            }
        }
    }
}

This is cut down code to demonstrate answer. In real life, you would want to check the ParseError collection and add some error handling!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM