简体   繁体   中英

How to speed up update query on massive table

I am currently in the process of transferring one of our existing databases onto a new ontology. The database follows a star schema with observation_fact being the center of the star with concept_dimension being a lookup table. In order to transfer to the new ontology I need to replace the concept_cd in observation_fact with a slightly different code in order to match the concept_cds in the new ontology.

I have tried writing an update query to accomplish this migration however it has been running for 5 days and I don't think it is going to finish anytime soon. I have indexed the two relevant tables on concept_cd.

This is the query that I initially wrote:

Update observation_fact ofact
Set concept_cd = q.cd
From (Select ofact2.ctid, Case 
    When split_part(ofact2.concept_cd, ':', 1) = 'ICD10-CM'  Then replace(ofact2.concept_cd, 'ICD10-CM:', 'ICD10CM:')
    When split_part(ofact2.concept_cd, ':', 1) = 'ICD10-PCS' Then replace(ofact2.concept_cd, 'ICD10-PCS:', 'ICD10PCS:')
    When split_part(ofact2.concept_cd, ':', 1) = 'ICD9' And cdim.concept_path like '\\i2b2\\Diagnoses\\%'  Then replace(ofact2.concept_cd, 'ICD9:', 'ICD9CM:')
    When split_part(ofact2.concept_cd, ':', 1) = 'ICD9' And cdim.concept_path like '\\i2b2\\Procedures\\%' Then replace(ofact2.concept_cd, 'ICD9:', 'ICD9PROC:')
  End as cd
  From observation_fact ofact2
  Left Outer Join concept_dimension_bak cdim
  On ofact2.concept_cd = cdim.concept_cd
) as q
Where ofact.ctid = q.ctid;

It felt very awkward to write since observation_fact does not have a true primary key or composite key thus I had to use ctid. Also I used observation_fact twice which from this answer Speed up Postgres Update on Large Table I know that this is a bad idea and probably part of the problem. I used left outer join because some of the concept_cds in observation_fact do not exist in concept_dimension_bak. As you can see the ICD10 replacements are very easy however for ICD9 I need to lookup the code in the old concept_dimension table in order to figure out which type of code it is and replace it accordingly.

I expect this update query to perform the appropriate replacement on any rows in observation_fact where the case statement matches and ignore everything else.

First, updating all rows in the table is going to take time. Sometimes, it is faster to create a new table with all the modified data, truncate the original table, and re-load it.

Second, you are referencing observation_fact twice, but that doesn't seem necessary. I think this does what you want:

update observation_fact ofact
    set concept_cd = (case when split_part(ofact.concept_cd, ':', 1) = 'ICD10-CM'
                           then replace(ofact.concept_cd, 'ICD10-CM:', 'ICD10CM:')
                           when split_part(ofact.concept_cd, ':', 1) = 'ICD10-PCS' 
                           then replace(ofact.concept_cd, 'ICD10-PCS:', 'ICD10PCS:')
                           when split_part(ofact.concept_cd, ':', 1) = 'ICD9' And cdim.concept_path like '\\i2b2\\Diagnoses\\%'
                           then replace(ofact.concept_cd, 'ICD9:', 'ICD9CM:')
                           when split_part(ofact.concept_cd, ':', 1) = 'ICD9' And cdim.concept_path like '\\i2b2\\Procedures\\%' 
                           then replace(ofact.concept_cd, 'ICD9:', 'ICD9PROC:')
                      end) as cd
from concept_dimension_bak cdim
where ofact.concept_cd = cdim.concept_cd;

You may need to set the unmatched values to NULL .

Instead of updating the table you should try to create a new table with the logic you have in the SQL, it will be faster. After the new table created you can rename the old table and rename the new table as observation_fact

To reiterate 1. Create new table

insert into observation_fact_new 
select ...
from observation_fact
  1. Rename old table do the sanity checks
alter table observation_fact rename to observation_fact_old
  1. Rename new table as observation_fact
alter table observation_fact_new rename to observation_fact

After your checks and tests are done drop the old table

drop table observation_fact_old

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM