流水线功能

Question

Can someone provide an example of how to use parallel table function in oracle pl/sql. 有人可以提供一个示例，说明如何在oracle pl / sql中使用并行表功能。 We need to run massive queries for 15 years and combine the result. 我们需要运行15年的大量查询并将结果组合起来。

SELECT * 
  FROM Table(TableFunction(cursor(SELECT * FROM year_table)))

...is what we want effectively. ...是我们有效地想要的。 The innermost select will give all the years, and the table function will take each year and run massive query and returns a collection. 最里面的select将给出所有年份，而table函数将每年使用并运行大量查询并返回一个集合。 The problem we have is that all years are being fed to one table function itself, we would rather prefer the table function being called in parallel for each of the year. 我们遇到的问题是，所有年份都被馈送到一个表函数本身，我们宁愿选择每年每年并行调用该表函数。 We tried all sort of partitioning by hash and range and it didn't help. 我们尝试通过哈希和范围进行各种分区，但没有帮助。

Also, can we drop the keyword PIPELINED from the function declaration? 另外，我们可以从函数声明中删除关键字PIPELINED吗？ because we are not performing any transformation and just need the aggregate of the resultset. 因为我们不执行任何转换，而只需要结果集的汇总。

Answer 1

There's an excellent write-up here . 有一个很好的写了这里。

There are alternative approaches (eg a 'master' job that cursors through YEAR_TABLE and submits a DBMS_JOB to process each year. Each 'year job' would insert its results into a table. 还有其他方法（例如，通过“ YEAR_TABLE”游标并提交一个DBMS_JOB来每年处理的“主”工作。每个“年”工作都将其结果插入表中。

Once all the spawned jobs are finished, you just pull the results from the table. 一旦完成所有产生的作业，您就可以从表中提取结果。

PS. PS。 I suspect parallel pipelined won't do what you want though. 我怀疑并行管道无法完成您想要的操作。 I created a large table with just three rows with a specific value. 我创建了一个大表，其中只有三行具有特定值。 I then created a parallel pipelined function that just pushed out the SID of the executing process (see below) and the number of rows it processes. 然后，我创建了一个并行管道函数，该函数只是推出了正在执行的进程的SID（请参见下文）及其处理的行数。 I had an SQL that picked out those three rows, and passed that as the the cursor into the function. 我有一个SQL可以选择这三行，并将其作为游标传递给函数。 Mostly the function pushed out two different SIDs (which is what EXPLAIN PLAN told me it picked as the parallelism degree). 通常，该函数推出了两个不同的SID（这是EXPLAIN PLAN告诉我的，它被选为并行度）。 Sometimes it showed two processes had ed executed, but all three rows were processed by one of those processes. 有时，它表明已经执行了两个进程，但是所有三行都由其中一个进程进行了处理。

So it isn't the case that rows will be picked from the cursor and passed to the parallel slaves to be processed, but rather each parallel process will be given a slice of the driving table to deal with. 因此，不是从游标中提取行并将其传递给并行从站进行处理，而是在每个并行进程中分配一部分驱动表来处理。 With a small table, it probably wouldn't consider parallel and even if it did, it may just allocate the first 50 rows to the first process etc. 对于一个小表，它可能不会考虑并行，即使这样做，也可能只将前50行分配给第一个进程，等等。

CREATE OR REPLACE FUNCTION test_pp(p_source     IN SYS_REFCURSOR)
   RETURN TAB_CHAR_4000  PIPELINED
   PARALLEL_ENABLE (PARTITION p_source BY ANY)
IS
   v_num NUMBER;
BEGIN
   FETCH p_source INTO v_num;
   WHILE p_source%FOUND LOOP
            PIPE ROW(sys_context('USERENV','SID'));
            FETCH p_source INTO v_num;
   END LOOP;
     PIPE ROW(sys_context('USERENV','SID')||':'||p_source%ROWCOUNT);
   CLOSE p_source;
   RETURN;
END test_pp;
/

流水线功能

问题描述

1 个解决方案

解决方案1
2 2010-03-12 01:04:19

流水线功能

问题描述

1 个解决方案

解决方案1 2 2010-03-12 01:04:19

解决方案1
2 2010-03-12 01:04:19