简体   繁体   中英

How to replace all subsets of characters based on values of other tables in pl/pgsql?

I've been doing some research on how to replace a subset of string of characters of a single row base on the values of the columns of other rows, but was not able to do so since the update are only for the first row values of the other table. So I'm planning to insert this in a loop in a plpsql function.

Here are the snippet of my tables. Main table:

 Table "public.tbl_main"
        Column         |  Type  | Modifiers 
-----------------------+--------+-----------
 maptarget             | text   | 
 expression            | text   | 


 maptarget |                 expression
-----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 43194-0   | 363787002:70434600=(386053000:704347000=(414237002:704320005=259470008,704318007=118539007,704319004=50863008),704327008=122592007,246501002=703690001,370132008=30766002)

Look-up table:

Table "public.tbl_values"
        Column         |  Type  | Modifiers 
-----------------------+--------+-----------
 conceptid             | bigint | 
 term                  | text   |

 conceptid |                   term                   
-----------+------------------------------------------
 386053000 | Patient evaluation procedure (procedure)
 363787002 | Observable entity (observable entity)
 704347000 | Observes (attribute)
 704320005 | Towards (attribute)
 704318007 | Property type (attribute)

I want to create a function that will replace all numeric values in the tbl_main.expression columns with their corresponding tbl_values.term using the tbl_values.conceptid as the link to each numeric values in the expression string.

I'm stuck currently in the looping part since I'm a newbie in LOOP of plpgsql. Here is the rough draft of my function.

--create first a test table
drop table if exists tbl_test;
create table tbl_test as select * from tbl_main limit 1;
--

create or replace function test () 
 RETURNS SETOF tbl_main
 LANGUAGE plpgsql
AS $function$
declare
 resultItem tbl_main;
 v_mapTarget text;
 v_expression text;
 ctr int;
begin
  v_mapTarget:='';
  v_expression:='';
  ctr:=1;

  for resultItem in (select * from tbl_test) loop
     v_mapTarget:=resultItem.mapTarget;
     select into v_expression expression from ee;
     raise notice 'parameter used: %',v_mapTarget;
     raise notice 'current expression: %',v_expression;

     update ee set expression=replace(v_expression, new_exp::text, term) from (select new_exp::text, term from tbl_values offset ctr limit 1) b ; 
     ctr:=ctr+1;
     raise notice 'counter: %', ctr;
     v_expression:= (select expression from ee);
     resultItem.expression:= v_expression;
     raise notice 'current expression: %',v_expression;
return next resultItem;
 end loop;
 return;
 end;
$function$;

Any further information will be much appreciated.
My Postgres version:

PostgreSQL 9.3.6 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu 4.8.2-19ubuntu1) 4.8.2, 64-bit

PL/pgSQL function with dynamic SQL

Looping is always a measure of last resort. Even in this case it is substantially cheaper to concatenate a query string using a query, and execute it once :

CREATE OR REPLACE FUNCTION f_make_expression(_expr text, OUT result text) AS
$func$
BEGIN
   EXECUTE (
      SELECT 'SELECT ' || string_agg('replace(', '') || '$1,'
           || string_agg(format('%L,%L)', conceptid::text, v.term), ','
                         ORDER BY conceptid DESC)
      FROM  (
         SELECT conceptid::bigint
         FROM   regexp_split_to_table($1, '\D+') conceptid
         WHERE  conceptid <> ''
         ) m
      JOIN   tbl_values v USING (conceptid)
      )
   USING _expr
   INTO result;
END
$func$ LANGUAGE plpgsql;

Call:

SELECT *, f_make_expression(expression) FROM tbl_main;

However , if not all conceptid have the same number of digits, the operation could be ambiguous. Replace conceptid with more digits first to avoid that - ORDER BY conceptid DESC does that - and make sure that replacement strings do not introduce ambiguity (numbers that might be replaced in the the next step). Related answer with more on these pitfalls:

The token $1 is used two different ways here, don't be misled:

  1.  regexp_split_to_table(  , '\\D+') 

    This one references the first function parameter _expr . You could as well use the parameter name.

  2.  || '  ,' 

    This concatenates into the SQL string a references to the first expression passed via USING clause to EXECUTE . Parameters of the outer function are not visible inside EXECUTE , you have to pass them explicitly.

It's pure coincidence that $1 ( _expr ) of the outer function is passed as $1 to EXECUTE . Might as well hand over $7 as third expression in the USING clause ( $3 ) ...

I added a debug function to the fiddle. With a minor modification you can output the generated SQL string to inspect it:

SQL function

Here is a pure SQL alternative. Probably also faster:

CREATE OR REPLACE FUNCTION f_make_expression_sql(_expr text)
  RETURNS text AS
$func$
SELECT string_agg(CASE WHEN $1 ~ '^\d'
                       THEN txt || COALESCE(v.term, t.conceptid) 
                       ELSE COALESCE(v.term, t.conceptid) || txt END
                , '' ORDER BY rn) AS result
FROM  (
   SELECT *, row_number() OVER () AS rn
   FROM  (
      SELECT regexp_split_to_table($1, '\D+') conceptid
           , regexp_split_to_table($1, '\d+') txt
      ) sub
   ) t
LEFT  JOIN tbl_values v ON v.conceptid = NULLIF(t.conceptid, '')::int
$func$  LANGUAGE sql STABLE;

In Postgres 9.4 this can be much more elegant with two new features:

CREATE OR REPLACE FUNCTION f_make_expression_sql(_expr text)
  RETURNS text AS
$func$
SELECT string_agg(CASE WHEN $1 ~ '^\d'
                       THEN txt || COALESCE(v.term, t.conceptid) 
                       ELSE COALESCE(v.term, t.conceptid) || txt END
                , '' ORDER BY rn) AS result
FROM   ROWS FROM (
          regexp_split_to_table($1, '\D+')
        , regexp_split_to_table($1, '\d+')
       ) WITH ORDINALITY AS t(conceptid, txt, rn)
LEFT   JOIN tbl_values v ON v.conceptid = NULLIF(t.conceptid, '')::int
$func$  LANGUAGE sql STABLE;

SQL Fiddle demonstrating all for Postgres 9.3.

There's also another way, without creating functions... using "WITH RECURSIVE". Used it with lookup talbe of thousands of rows.

You'll need to change following table names and columns to your names:

tbl_main, strsourcetext, strreplacedtext;

lookuptable, strreplacefrom, strreplaceto.

WITH RECURSIVE replaced AS (
(SELECT
strsourcetext,
strreplacedtext,
array_agg(strreplacefrom ORDER BY length(strreplacefrom) DESC, strreplacefrom, strreplaceto) AS arrreplacefrom,
array_agg(strreplaceto ORDER BY length(strreplacefrom) DESC, strreplacefrom, strreplaceto) AS arrreplaceto,
count(1) AS intcount,
1 AS intindex
FROM tbl_main, lookuptable WHERE tbl_main.strsourcetext LIKE '%' || strreplacefrom || '%'
GROUP BY strsourcetext)
UNION ALL 
SELECT
strsourcetext,
replace(strreplacedtext, arrreplacefrom[intindex], arrreplaceto[intindex]) AS strreplacedtext,
arrreplacefrom,
arrreplaceto,
intcount,
intindex+1 AS intindex
FROM replaced WHERE intindex<=intcount
)
SELECT strsourcetext,
(array_agg(strreplacedtext ORDER BY intindex DESC))[1] AS strreplacedtext
FROM replaced 
GROUP BY strsourcetext

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM