简体   繁体   English

获取上一行的数据并计算 SSIS 脚本组件中的增长

[英]Get previous row's data and calculate growth in SSIS script component

I have a table like this :我有一张这样的桌子:

Month_ID  Month_Sales 
1         500.0 
2         250.0 
3         150.5 

I want to add a new column to this table which is "Growth" where :我想在这个表中添加一个新列,它是“增长”,其中:

Growth = (Current Month Sales - Prev Month Sales) / Prev Month Sales

I want to do this using SSIS script component.我想使用 SSIS 脚本组件来做到这一点。 How can I do this?我怎样才能做到这一点?

1/Create 2 package variables: 1/创建2个包变量:

  • CurrentMonthSales当月销售额

  • PrevMonthSales上月销售

Both are Double两者都是双

2/ Use a Data Flow Task where you need a source component (depends on the data source you are dealing with) and select if it is a table or a query. 2/ 在需要源组件的地方使用数据流任务(取决于您正在处理的数据源)并选择它是表还是查询。

Let's consider it a table.让我们把它当作一张桌子。

3/Drag Row Count in the Data Flow Transformations list. 3/在数据流转换列表中拖动Row Count Double click it and in the Section Variable Names, select the variable User::CurrentMonthSales .双击它并在部分变量名称中,选择变量User::CurrentMonthSales Row Count task will save, in runtime, the result of the calculation in that variable. Row Count 任务将在运行时保存该变量中的计算结果。

4/Use a second Data Flow Task in the Control Flow. 4/在控制流中使用第二个数据流任务。 Inside drag another OLEDB Source with the same table .在里面拖动另一个 OLEDB Source 与同一个 table 。 Use another Row Count Task, but this time use the variable User::PrevMonthSales .使用另一个 Row Count Task,但这次使用变量User::PrevMonthSales After the Row Count Task use either a Script Component or a derived column.在行计数任务之后使用脚本组件或派生列。

5/If you use Derived Column, write a name for the column, choose the option 'Replace the name of your column' if you want to replace the value of your column, or 'Add column' if you want to add that as a column output. 5/如果您使用派生列,请为列写一个名称,如果要替换列的值,请选择“替换列的名称”选项,如果要添加列的值,请选择“添加列”选项列输出。

6/In the expression, apply the formula: 6/在表达式中,应用公式:

( @[User::CurrentMonthSales] - @[User::PrevMonthSales]) /@[User::PrevMonthSales]

and map to the corresponding column in your destination component.并映射到目标组件中的相应列。

For this to work as expected, you must have sorted data as your input.要使其按预期工作,您必须将已排序的数据作为输入。 Either,任何一个,

  • Add an explicit sort to your data source.向数据源添加显式排序。 Change the data access method from table or view to query and use something like SELECT * FROM dbo.MyTable AS T ORDER BY T.Month_ID and then in the Advanced properties, indicate that the data is sorted by that column将数据访问方法从表或视图更改为查询并使用诸如SELECT * FROM dbo.MyTable AS T ORDER BY T.Month_ID ,然后在高级属性中,指示数据按该列排序
  • Add an explicit Sort component to your data flow between the source and the Script Component.将显式排序组件添加到源和脚本组件之间的数据流中。 Most people choose this option because they think it's what you're supposed to do - they provide you with a Sort component after all, but don't.大多数人选择这个选项是因为他们认为这是你应该做的——毕竟他们为你提供了一个 Sort 组件,但没有。 You will get better performance by sorting explicitly before the data ever gets into the pipeline.通过在数据进入管道之前进行显式排序,您将获得更好的性能。

Script Component脚本组件

I'll be freehanding the script as my current box doesn't have SSIS installed so apologies if I flub something but hopefully the explanation of the method will be better than any possible code errors.我将随意编写脚本,因为我当前的盒子没有安装 SSIS,所以如果我弄错了,我很抱歉,但希望该方法的解释将比任何可能的代码错误更好。

A Script Component by default is a synchronous component.默认情况下,脚本组件是同步组件。 One row in, one row out.一排进,一排出。 So, there's not native way to say "previous row" as there's only the current row's context available to us.因此,没有本地方式可以说“上一行”,因为我们只能使用当前行的上下文。 Unless we explicitly store what we need into variables.除非我们明确地将我们需要的内容存储到变量中。

So, Synchronous is fine behavior here but we'll need to go to the Input/Ouput tab and indicate we'll have our new column Growth data type is Decimal/Numeric (one of them allows you specify the precision/scale, the other is always zero. You want the non-zero data type)所以,同步在这里是很好的行为,但我们需要转到Input/Ouput选项卡并指出我们将拥有我们的新列Growth数据类型是十进制/数字(其中一个允许您指定精度/小数位数,另一个始终为零。您需要非零数据类型)

In the column list, you'll want to check Month_Sales as a ReadOnly column and Growth is going to be ReadWrite.在列列表中,您需要检查 Month_Sales 作为只读列,并且增长将是读写。

In the script itself, we need to create a class scoped member variable that will "remember" what the previous Month_Sales value was.在脚本本身中,我们需要创建一个类作用域成员变量,它会“记住”上一个 Month_Sales 值是什么。 For each row that enters the buffer, we'll对于进入缓冲区的每一行,我们将

  1. Compute the Growth计算增长
  2. Push that value into the pipeline将该值推送到管道中
  3. Update our member variable with the current row's Month_Sales's value in preparation for the next row.用当前行的 Month_Sales 值更新我们的成员变量,为下一行做准备。

Business question - What is the growth for Month_ID 1 in the above?业务问题 - 上述 Month_ID 1 的增长是多少? We have no previous month value.我们没有上个月的价值。 If we initialize Prev Month Sales to 0, we'll have a divide by zero error.如果我们将 Prev Month Sales 初始化为 0,我们将得到除以零误差。

public class ScriptMain: UserComponent {公共类 ScriptMain: UserComponent {

public double previousSales;
public overrides void PreExecute()
{
    base.PreExecute();
    // By init to zero, we need to guard against divide by zero
    this.previousSales = 0;
}

public override void Input0_ProcessInputRow(Input0Buffer Row)
{
    double currentSales = 0;
    double growth = 0;
    // Worry about nulls in source data
    if (!Row.MonthSales_IsNull)
    {
        currentSales = Row.MonthSales;
    }

    // Avoid division by zero
    if (this.previousSales != 0)
    {
        growth = (currentSales-this.previousSales) / this.previousSales;
    }


    // Might want to set growth's IsNull if no action performed via the else clause
    Row.Growth = growth;

    // Update the "previous" value with current for next loop
    // Again, be wary of nulls and understand how business expects the calculations to work
    this.previousSales = currentSales;
    
}

} }

Query approach查询方式

Assuming SQL Server 2012+ or comparable RDBMS, you can compute that value efficiently with a windowing function.假设使用 SQL Server 2012+ 或类似的 RDBMS,您可以使用窗口函数有效地计算该值。 Here, I use the LAG as it provides the ability to access the previous row.在这里,我使用LAG,因为它提供了访问前一行的能力。

You'll note I use a derived table in the query.您会注意到我在查询中使用了派生表。 That's not strictly necessary, but I find it a helpful shorthand to avoid copy/pasting the LAG function twice for the numerator and denominator.这不是绝对必要的,但我发现这是一个有用的速记,可以避免为分子和分母复制/粘贴 LAG 函数两次。

CREATE TABLE dbo.SO_69565661
(
    Month_ID int NOT NULL
,   Month_Sales decimal(9,2) NOT NULL
);

INSERT INTO dbo.SO_69565661
SELECT 1, 500.0
UNION ALL SELECT 2, 250.0
UNION ALL SELECT 3, 150.5;

-- Make use of a derived table to ensure we have the correct window into previous row
-- Use something like this in your OLE DB Source (or even skip the data flow and make this an Execute SQL Command by adding the INSERT/UPDATE)
SELECT D.*

,   (D.Month_Sales - D.PreviousSales) / D.PreviousSales AS Growth
,   FORMAT((D.Month_Sales - D.PreviousSales) / D.PreviousSales, 'P') AS GrowthFormattedText

FROM
(
SELECT
    T.*
    -- Second parameter, 1 here, default and optional, is the number of row(s) to lag by.
    -- Third paraemter NULL here, default and optional, is what is used when no row exists
,   LAG(T.MONTH_SALES, 1, NULL) OVER(ORDER BY T.Month_ID) AS PreviousSales
FROM
    dbo.SO_69565661 AS T
)D;

See the query in action via通过查看操作中的查询

https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=0246900438b8eb13d70a4332c1bdad6f https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=0246900438b8eb13d70a4332c1bdad6f

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM