Poor performance in SQL Server function

Question

An application which we have built has undergone a large change in its database schema, particularly in the way financial data is stored. We have functions that calculate the total amount of billing, based on various scenarios; and the change is causing huge performance problems when the functions must be run many times in a row.

I'll include an explanation, the function and the relevant schema, and I hope someone sees a much better way to write the function. This is SQL Server 2008.

First, the business basis: think of a medical Procedure. The healthcare Provider performing the Procedure sends one or more Bills, each of which may have one or more line items (BillItems).

That Procedure is the re-billed to another party. The amount billed to the third party may be:

The total of the Provider's billing
The total of the Provider's billing plus a Copay amount,or
A completely separate amount (a Rebill amount)

The current function for calculating the billing for a Procedure looks at all three scenarios:

CREATE FUNCTION [dbo].[fnProcTotalBilled]  (@PROCEDUREID INT)
    RETURNS MONEY AS
BEGIN
DECLARE @billed MONEY
    SELECT @billed = (SELECT COALESCE((SELECT COALESCE(sum(bi.Amount),0)
    FROM BillItems bi INNER JOIN Bills b ON b.BillID=bi.BillID
        INNER JOIN Procedures p on p.ProcedureID=b.ProcedureID
    WHERE b.ProcedureID=@PROCEDUREID
    AND p.StatusID=3
    AND b.HasCopay=0
    AND b.Rebill=0),0))
-- the total of the provider's billing, with no copay and not rebilled
    +
    (SELECT COALESCE((SELECT sum(bi.Amount) + COALESCE(b.CopayAmt,0)
    FROM BillItems bi INNER JOIN Bills b ON b.BillID=bi.BillID
        INNER JOIN Procedures p on p.ProcedureID=b.ProcedureID
    WHERE b.ProcedureID=@PROCEDUREID
    AND p.StatusID=3
    AND b.HasCopay=1
    GROUP BY b.billid,b.CopayAmt),0))
-- the total of the provider's billing, plus a Copay amount
    +
    (SELECT COALESCE((SELECT sum(COALESCE(b.RebillAmt,0))
    FROM Bills b
        INNER JOIN Procedures p on p.ProcedureID=b.ProcedureID
    WHERE b.ProcedureID=@PROCEDUREID
    AND p.StatusID=3
    AND b.Rebill=1),0))
-- the Rebill amount, instead of the provider's billing
    RETURN @billed
END

I'll omit the DDL for the Procedure. Suffice to say, it must have a certain status (shown in the function as p.StatusID= 3).

Here are the DDLs for Bills and related BillItems:

CREATE TABLE dbo.Bills (
    BillID int IDENTITY(1,1) NOT NULL,
    InvoiceID int DEFAULT ((0)),
    CaseID int NOT NULL,
    ProcedureID int NOT NULL,
    TherapyGroupID int DEFAULT ((0)) NOT NULL,
    ProviderID int NOT NULL,
    Description varchar(1000),
    ServiceDescription varchar(255),
    BillReferenceNumber varchar(100),
    TreatmentDate datetime,
    DateBilled datetime,
    DateBillReceived datetime,
    DateBillApproved datetime,
    HasCopay bit DEFAULT ((0)) NOT NULL,
    CopayAmt money,
    Rebill bit DEFAULT ((0)) NOT NULL,
    RebillAmt money,
    IncludeInDemand bit DEFAULT ((1)) NOT NULL,
    CreateDate datetime DEFAULT (getdate()) NOT NULL,
    CreatedByID int,
    ChangeDate datetime,
    ChangeUserID int,
    PRIMARY KEY (BillID)
);


CREATE TABLE dbo.BillItems (
    BillItemID int IDENTITY(1,1) NOT NULL,
    BillID int NOT NULL,
    ItemDescription varchar(1000),
    Amount money,
    WillNotBePaid bit DEFAULT ((0)) NOT NULL,
    CreateDate datetime DEFAULT (getdate()),
    CreatedByID int,
    ChangeDate datetime,
    ChangeUserID varchar(25),
    PRIMARY KEY (BillItemID)
);

I fully realize how complex the function is; but I couldn't find another way to account for all the scenarios.

I'm hoping that a far better SQL programmer or DBA will see a more performant solution.

Any help will be greatly appreciated.

Thanks,

Tom

UPDATE:

Thanks to everyone for their replies. I tried to add a little clarification in comments, but I'll do so here, too.

First, a definition: a Procedure is medical service from a Provider on a single Date of Service. We only concern ourselves with the total amount billed for a procedure; multiple persons do not receive bills.

A "Case" can have many Procedures.

Generally, a single Procedure will have a single Bill - but not always. A Bill may have one or more BillItems. The Copay (if one exists) is added to the sum of the BillItems. A Rebill Amount trumps everything.

The performance issue comes into play at a higher level, when calculating the totals for an entire Case (many Procedures) and when needing to display grid data that shows hundreds of Cases at once.

My query was at the Procedure level, because it was simpler to describe the problem.

As to sample data, the data in @Serpiton's SQL Fiddle is an excellent, concise example. Thank you very much for it.

In reviewing the answers, it seems to me that both the CTE approach of @Serpiton and @GarethD's view approach both are strong improvements on my original. For the moment, I'm going to work with the CTE approach, simply to avoid the necessity of dealing with the multiple results from the SELECT.

I have modified @Serpiton's CTE to work at the Case level. If he or others would please take a look at it, I'd appreciate it. It's working well in my testing, but I'd appreciate other eyes on it.

It goes like this:

WITH Normal As (
SELECT b.BillID
   , b.CaseID
   , sum(coalesce(n.Amount * (1 - b.Rebill), 0)) Amount
FROM   Procedures p
     INNER JOIN Bills b ON p.ProcedureID = b.ProcedureID
     LEFT  JOIN BillItems n ON b.BillID = n.BillID
WHERE  b.CaseID = 3444
AND  p.StatusID = 3
GROUP BY b.CaseID,b.BillID, b.HasCopay
)
SELECT Amount = Sum(b.Amount) 
          + Sum(Coalesce(c.CopayAmt, 0)) 
          + Sum(Coalesce(r.RebillAmt, 0))
FROM   Normal b
   LEFT  JOIN Bills c ON b.BillID = c.BillID And c.HasCopay = 1
   LEFT  JOIN Bills r ON b.BillID = r.BillID And r.Rebill = 1
GROUP BY b.caseid

Answer 1

A very quick win is to use a (TABLE VALUED) (INLINE) FUNCTION instead of a (SCALAR) (MULTI-STATEMENT) FUNCTION.

CREATE FUNCTION [dbo].[fnProcTotalBilled]  (@PROCEDUREID INT)
AS
RETURN (
        SELECT
          (sub-query1)
          +
          (sub-query2)
          +
          (sub-query3)   AS amount
       );

This can then be used as follows:

SELECT
  something.*,
  totalBilled.*
FROM
  something
CROSS APPLY            -- Or OUTER APPLY
  [dbo].[fnProcTotalBilled](something.procedureID)   AS totalBilled

Over larger data-sets this is significantly faster than using scalar functions.
- It must be INLINE (Not Multi-Statement)
- It must be TABLE-VALUED (Not Scalar)

If you work out better business logic for the calculation, you'll get even more performance benefits again.

EDIT :

This may be functionally the same as you have described, but it's hard to tell. Please add comments to my question to investigate further.

SELECT
  SUM(
    CASE WHEN b.HasCopay = 0 AND b.Rebill = 0 THEN               COALESCE(bi.TotalAmount, 0)
         WHEN b.HasCopay = 1                  THEN b.CopayAmt  + COALESCE(bi.TotalAmount, 0)
         WHEN                    b.Rebill = 1 THEN b.RebillAmt
                                              ELSE 0
    END
  )  AS Amount
FROM
  Procedures p
INNER JOIN
  Bills      b
    ON  b.ProcedureID = p.ProcedureID
LEFT JOIN
(
  SELECT BillID, SUM(Amount) AS TotalAmount
    FROM BillItems
GROUP BY BillID
)
  AS bi
    ON  bi.BillID     = b.BillID
WHERE
      p.ProcedureID=@PROCEDUREID
  AND p.StatusID=3

The 'trick' that makes this simpler is the sub-query to aggregate all the BillItems together in to one record per BillID . The optimiser won't actually do that for the whole table, but only for the relevant records based on your JOIN s and WHERE clause.

This then means that Bill : BillItem is 1 : 0..1 , and everything simplifies. I believe ;)

Answer 2

The first thing I have noticed is that your query could fail if there is more than one billID for a procedureID (I don't know if this is possible in your design though). If it is and it happens then this part will fail:

(SELECT COALESCE((SELECT sum(bi.Amount) + COALESCE(b.CopayAmt,0)
FROM BillItems bi INNER JOIN Bills b ON b.BillID=bi.BillID
    INNER JOIN Procedures p on p.ProcedureID=b.ProcedureID
WHERE b.ProcedureID=@PROCEDUREID
AND p.StatusID=3
AND b.HasCopay=1
GROUP BY b.billid,b.CopayAmt),0))

Due to the grouping, you will get more than one result returned in the subquery which is not allowed. I don't think this would affect my overall decision on how to alter your schema though.

I would consider turning this into a view, when you operate this as a scalar UDF it is executed once per row, when you use a view the definition is expanded out into the outer query and can be optimised accordingly.

You can also turn this into a single select, the first step would be to get the components common to all three subqueries:

SELECT  p.ProcedureID,
        bi.Amount,
        b.HasCopay,
        b.CopayAmt,
        b.Rebill,
        b.RebillAmt,
FROM    (   SELECT  BillID, Amount = SUM(Amount)
            FROM    Billitems 
            GROUP BY BillID
        ) bi
        INNER JOIN Bills b
            ON b.BillID = bi.BillID
        INNER JOIN Procedures p
            ON p.ProcedureID = b.ProcedureID
WHERE   p.StatusID = 3;

You can now combine the logic of the 3 subqueries to get the same total:

SELECT  p.ProcedureID,
        Amount = CASE WHEN b.Rebill = 0 THEN bi.Amount ELSE 0 END,
        CopayAmt = CASE WHEN b.HasCopay = 1 THEN b.CopayAmt ELSE 0 END,
        RebillAmt = CASE WHEN b.Rebill = 1 THEN b.RebillAmt ELSE 0 END,
FROM    (   SELECT  BillID, Amount = SUM(Amount)
            FROM    Billitems 
            GROUP BY BillID
        ) bi
        INNER JOIN Bills b
            ON b.BillID = bi.BillID
        INNER JOIN Procedures p
            ON p.ProcedureID = b.ProcedureID
WHERE   p.StatusID = 3;

You can now combine aggregate this and move to a view for reusability (I have moved the case statements above to an APPLY simply to avoid repeating the case statement in the Total column):

CREATE VIEW dbo.ProcTotalBilled
AS
    SELECT  p.ProcedureID,
            Amount = SUM(calc.Amount),
            CopayAmt = SUM(calc.CopayAmt),
            Rebill = SUM(cal.RebillAmt),
            Total = SUM(calc.Amount +  calc.CopayAmt + cal.RebillAmt)
    FROM    (   SELECT  BillID, Amount = SUM(Amount)
                FROM    Billitems 
                GROUP BY BillID
            ) bi
            INNER JOIN Bills b
                ON b.BillID = bi.BillID
            INNER JOIN Procedures p
                ON p.ProcedureID = b.ProcedureID
            CROSS APPLY
            (   SELECT  Amount = CASE WHEN b.Rebill = 0 THEN bi.Amount ELSE 0 END,
                        CopayAmt = CASE WHEN b.HasCopay = 1 THEN b.CopayAmt ELSE 0 END,
                        RebillAmt = CASE WHEN b.Rebill = 1 THEN b.RebillAmt ELSE 0 END
            ) calc
    WHERE   p.StatusID = 3
    GROUP BY p.ProcedureID;

Then instead of using something like:

SELECT  Total = dbo.fnProcTotalBilled(p.ProcedureID)
FROM    dbo.Procedures p;

You would use

SELECT  Total = ISNULL(ptb.Total, 0)
FROM    dbo.Procedures p
        LEFT JOIN dbo.ProcTotalBilled ptb
            ON ptb.ProcedureID = p.ProcedureID;

Slightly more verbose, but I would be surprised if it didn't outperform your scalar UDF considerably

Answer 3

Answer to the update
To increase the performance you can create a view with the same definition of the CTE, so that the query plan will be stored and reused.
If you have to calculate more than one total amount don't try to get them individually, a better plan would be to get all of them with a single query, writing a condition like

WHERE b.CaseID IN (list of cases)

or some other condition that fit your needs, and adding some more information in the main query, at least the CaseID.

Update
@DRapp pointed out a problem with my previous solution (that I write without testing, sorry pals), to remove the trouble I had removed BillItems from the main query, that now works only with the Bills.

WITH Normal As (
  SELECT b.BillID
       , b.ProcedureID
       , sum(coalesce(n.Amount * (1 - b.Rebill), 0)) Amount
  FROM   Procedures p
         INNER JOIN Bills b ON p.ProcedureID = b.ProcedureID
         LEFT  JOIN BillItems n ON b.BillID = n.BillID
WHERE  p.ProcedureID = @PROCEDUREID
  AND  p.StatusID = 3
GROUP BY b.ProcedureID, b.BillID, b.HasCopay
)
SELECT @Billed = Sum(b.Amount) 
               + Sum(Coalesce(c.CopayAmt, 0)) 
               + Sum(Coalesce(r.RebillAmt, 0))
FROM   Normal b
       LEFT  JOIN Bills c ON b.BillID = c.BillID And c.HasCopay = 1
       LEFT  JOIN Bills r ON b.BillID = r.BillID And r.Rebill = 1
GROUP BY b.ProcedureID

How it works
The Normal CTE get all the bills related to the ProcedureID, and calculate the Bill Total, the Amount * (1 - Rebill) set the Amount to 0 if the Bill is to rebill.
In the main query the Normal CTE is joined to the special type of bill, as Normal contains all the Bills for the selected ProcedureID , the table Procedures is not there.

Demo with random data.

Old Query
Without data to test our query this is a blind fly

SELECT @billed = Sum(Coalesce(n.Amount, 0)) 
               + Sum(Coalesce(c.CopayAmt, 0)) 
               + Sum(Coalesce(r.RebillAmt, 0))
FROM   Procedures p on 
       INNER JOIN Bills b ON p.ProcedureID = b.ProcedureID And b.Rebill = 0
       INNER JOIN BillItems n ON b.BillID = n.BillID
       INNER JOIN Bills c ON p.ProcedureID = b.ProcedureID And c.HasCopay = 1
       INNER JOIN Bills r ON p.ProcedureID = b.ProcedureID And r.Rebill = 1
Where  p.ProcedureID = @PROCEDUREID
  AND  p.StatusID = 3

Where b is the alias for the "normal" bill (with n for the bill items), c for the copayed bill and r for the rebilled.
The JOIN condition of b check only for b.Rebill = 0 to get the bill items for both the "normal" bills and the copaid ones.
I assume that no bill can have both HasCopay and Rebill to 1

Answer 4

Can you show some sample data that covers the variety of samples? Also, the procedure I would expect is more a lookup table, and many people could be billed for the same procedure, thus the BillID would be critical to the function. What has been billed to a given person for a given procedure. The function would then have TWO parameters, one for the procedure you were interested in, and second for the patient's actual Bill.

Then, the inner queries would be restricted down to the one person's bill... Unless the procedure is unique per person having the procedure done, but that is unclear since DDL for procedure is not provided.

I have other thoughts on the querying, but would need clarification from above context so I do not throw crud here just to show a query.

Answer 5

After all, you need some values from bills along with the sum of bill items. You could simplify the query thus:

select sum
(
  coalesce( case when b.rebill = 1 then b.rebillamt end , 0 ) +
  coalesce( case when b.rebill = 0 then (select sum(bi.amount) from billitems bi where bi.billid = b.billid) end , 0 ) +
  coalesce( case when b.rebill = 0 and b.hascopay = 1 then b.copayamt end , 0 )
) as value
from procedures p
inner join bills b on b.procedureid = p.procedureid
where p.ProcedureID = @PROCEDUREID
and p.StatusID = 3;

but T-SQL is buggy in this regard and complains with "Cannot perform an aggregate function on an expression containing an aggregate or a subquery". So you will have to use an inner and outer select instead.

select sum(value) as total
from
(
  select
    coalesce( case when b.rebill = 1 then b.rebillamt end , 0 ) +
    coalesce( case when b.rebill = 0 then (select sum(bi.amount) from billitems bi where bi.billid = b.billid) end , 0 ) +
    coalesce( case when b.rebill = 0 and b.hascopay = 1 then b.copayamt end , 0 ) as value
  from procedures p
  inner join bills b on b.procedureid = p.procedureid
  where p.ProcedureID = @PROCEDUREID
  and p.StatusID = 3
) allvalues;

You wouldn't even have to join table procedures with the bills table, but get the procedure id in an inner select. But I tried it with Serpiton's SQL fiddle (thanks to Serpiton for this) and T-SQL processes this slower than the join. You can try it anyhow. Maybe it is faster in your SQL Server version with your tables:

select sum(value) as total
from
(
  select
    coalesce( case when b.rebill = 1 then b.rebillamt end , 0 ) +
    coalesce( case when b.rebill = 0 then (select sum(bi.amount) from billitems bi where bi.billid = b.billid) end , 0 ) +
    coalesce( case when b.rebill = 0 and b.hascopay = 1 then b.copayamt end , 0 ) as value
  from bills b
  where b.procedureid = 
  (
    select p.procedureid
    from procedures p
    where p.ProcedureID = @PROCEDUREID
    and p.StatusID = 3
  )
) allvalues;

EDIT: Here is one more option. Provided the given procedure id always exists and you only want to check if the status id is 3, then you can write the statement so that the bill select is only executed in the case of status id = 3. That doesn't have to be faster; it can even turn out to be slower. It's just one more option you can try.

select
  case when p.StatusID = 3 then
    (
      select sum(value)
      from
      (
        select
          coalesce( case when b.rebill = 1 then b.rebillamt end , 0 ) +
          coalesce( case when b.rebill = 0 then (select sum(bi.amount) from billitems bi where bi.billid = b.billid) end , 0 ) +
          coalesce( case when b.rebill = 0 and b.hascopay = 1 then b.copayamt end , 0 ) as value
        from bills b 
        where b.procedureid = p.procedureid
      ) allvalues
    )
  else
    0
  end as value 
from procedures p
where p.ProcedureID = @PROCEDUREID;

Poor performance in SQL Server function

Question

5 answers

solution1
2 2014-04-24 14:06:48

solution2
1 2014-04-24 14:23:06

solution3
1 ACCPTED 2014-04-24 16:06:59

solution4
0 2014-04-24 14:47:23

solution5
0 2014-04-27 00:33:34

Poor performance in SQL Server function

Question

5 answers

solution1 2 2014-04-24 14:06:48

solution2 1 2014-04-24 14:23:06

solution3 1 ACCPTED 2014-04-24 16:06:59

solution4 0 2014-04-24 14:47:23

solution5 0 2014-04-27 00:33:34

solution1
2 2014-04-24 14:06:48

solution2
1 2014-04-24 14:23:06

solution3
1 ACCPTED 2014-04-24 16:06:59

solution4
0 2014-04-24 14:47:23

solution5
0 2014-04-27 00:33:34