简体   繁体   中英

Multi-row, instead of single row data transformation with trigger in MYSQL

I have this query:

CREATE TRIGGER move_form_data
AFTER INSERT ON schema.original_table
FOR EACH ROW
INSERT INTO schema.new_table (name, street_address, 
            street_address_line_2, city, state, zip, country, dob)
SELECT name, street_address, street_address_line_2, city, state, zip, country, dob 
from view_data_submits

with calls this view:

CREATE VIEW view_data_submits AS 

SELECT  
        MAX(CASE WHEN element_label = 0 THEN element_value end) AS name,
        MAX(CASE WHEN element_label = 1 THEN element_value end) AS street_address,
        MAX(CASE WHEN element_label = 2 THEN element_value end) AS street_address_line_2,
        MAX(CASE WHEN element_label = 3 THEN element_value end) AS city,
        MAX(CASE WHEN element_label = 4 THEN element_value end) AS state,
        MAX(CASE WHEN element_label = 5 THEN element_value end) AS zip,
        MAX(CASE WHEN element_label = 6 THEN element_value end) AS country,
        MAX(CASE WHEN element_label = 7 THEN element_value end) AS dob
FROM schema.original_table
WHERE group_id = (select MAX(group_id) from schema.original_table)
group by group_id

I want 1 row back, and the trigger works as intended without the trigger part with just this code:

INSERT INTO schema.new_table (name, street_address, 
                street_address_line_2, city, state, zip, country, dob)
    SELECT name, street_address, street_address_line_2, city, state, zip, country, dob 
    from view_data_submits

currently, it give me back the inserted row when the user submits a form, but it transforms from the original table to the new table like this:

# id, name, street_address, street_address_line_2, city, state, zip, country, dob
2, fsa asdadFQ, , , , , , , 
3, fsa asdadFQ, BOOGYBOOGYBOOGY, , , , , , 
4, fsa asdadFQ, BOOGYBOOGYBOOGY, YOUdooWORK, , , , , 
5, fsa asdadFQ, BOOGYBOOGYBOOGY, YOUdooWORK, A, , , , 
6, fsa asdadFQ, BOOGYBOOGYBOOGY, YOUdooWORK, A, DD, , , 
7, fsa asdadFQ, BOOGYBOOGYBOOGY, YOUdooWORK, A, DD, 09876, , 
8, fsa asdadFQ, BOOGYBOOGYBOOGY, YOUdooWORK, A, DD, 09876, Belize, 
9, fsa asdadFQ, BOOGYBOOGYBOOGY, YOUdooWORK, A, DD, 09876, Belize, 2014-02-05  <--only row that I want (=the total form submission)

instead of just:

# id, name, street_address, street_address_line_2, city, state, zip, country, dob

9, fsa asdadFQ, BOOGYBOOGYBOOGY, YOUdooWORK, A, DD, 09876, Belize, 2014-02-05

I have a feeling it is either to do with the FOR EACH ROW syntax, or the application saves in a compounding fashion somehow. I am leaning towards the first one.

Anyone have any suggestions for a remedy? I almost feel as though its some noob mistake that I just forgot about....haha.

~~EDIT per request:

here is the select * from the original table where the max id is being pulled:

# id, form_id, element_label, element_value, group_id
----+--------+--------------+--------------+---------
 207,       2,             0,          name,       25
 208,       2,             1,     address 1,       25
 209,       2,             2,     address 2,       25
 210,       2,             3,          city,       25
 211,       2,             4,         state,       25
 212,       2,             5,           zip,       25
 213,       2,             6,       country,       25
 214,       2,             7,           dob,       25

since the values are blob form, I replaced the values with what they represent, I just pulled the newest inserted data

I have narrowed this down to the application inserting each field in separately, which causes the trigger and FOR EACH ROW syntax making it act as a row by row basis. This syntax is required in MySQL, which only allows row based triggers and not "query" based triggers as in Oracle and some other DB languages.

I have asked a separate question on a workaround for this here: Workaround for FOR EACH ROW in MySQL

This looks like an EAV schema (oh! the joys!).

It looks like the root problemis that the application isn't inserting a "row" the way you want to see it; it's inserting multiple rows into the same table, with each row representing a single attribute value.

The application is using Entity-Attributute-Value (EAV) model, and what you want is a row that looks like a traditional relational model.

What that rather ugly "MAX(),MAX(),MAX() ... GROUP BY" query is doing is converting all those EAV rows into columns of a single row.


It looks like you want to do that conversion "on-the-fly" and maintain the contents of the target_table whenever rows are inserted into the original_table.

If I were solving that problem, I would include the group_id in my target_table, since that's the value that is relating all the individual EAV rows together (as demonstrated in your view query.)

And I definitely would NOT use a SELECT MAX(group_id) query to reference the value on the row that was just inserted into original_table . In the context of an AFTER INSERT trigger, I already have the group_id value of the row that was just inserted; it's available to me as " NEW.group_id ".

(The real reason I would avoid using a MAX(group_id) query to get that value is that I don't have a guarantee that some other process isn't going to insert a larger value for group_id while my process is running. I'm not guaranteed the MAX(group_id) will return the value of group_id that was just inserted. (Granted, I won't ever see that problem happen in single user testing; I'd have to include some deliberate delays in my processing, and have two processes running at the same time in order to get that to happen. This is one of those problems that pops up in production, rather than in testing, basically because we don't bother to setup the test case to discover the problem.)

If I only want a single row in my target_table for each group_id value, I would create a unique constraint on the group_id column in my target_table. Then I would use an "upsert"-type function to update the row if it already exists, or insert a row if one doesn't exist.

I can easily do that with MySQL an INSERT ... ON DUPLICATE KEY ... statement. This requires a unique constraint, but we already have that covered. One downside of this statement is that if my target_table has an AUTO_INCREMENT column, this will "burn" through an auto_increment values even when a row already exists.

Based on what you have in your trigger/view, I could do something like this:

INSERT INTO target_table (group_id, name, street_address, ... )
SELECT o.group_id
       MAX(CASE WHEN o.element_label = 0 THEN o.element_value end) AS name,
       MAX(CASE WHEN o.element_label = 1 THEN o.element_value end) AS street_address,
       MAX(CASE WHEN o.element_label = 2 THEN o.element_value end) AS street_address_line_2,
       MAX(CASE WHEN o.element_label = 3 THEN o.element_value end) AS city,
       MAX(CASE WHEN o.element_label = 4 THEN o.element_value end) AS state,
       MAX(CASE WHEN o.element_label = 5 THEN o.element_value end) AS zip,
       MAX(CASE WHEN o.element_label = 6 THEN o.element_value end) AS country,
       MAX(CASE WHEN o.element_label = 7 THEN o.element_value end) AS dob
  FROM schema.original_table o
 WHERE o.group_id = NEW.group_id
 GROUP BY o.group_id
    ON DUPLICATE KEY
UPDATE name                  = VALUES(name)
     , street_address        = VALUES(street_address)
     , street_address_line_2 = VALUES(street_address_line2)
     , city                  = VALUES(city)
     , state                 = VALUES(state)
     , zip                   = VALUES(zip)
     , country               = VALUES(country)
     , dob                   = VALUES(dob)

Note that I'm counting on the UNIQUE constraint on target_table(group_id) to throw a "duplicate key" exception when it attempts to insert a row with a group_id value that already exists in target_table. When that happens, this statement will turn into an UPDATE statement, with an implied WHERE group_id = VALUES(group_id) (whatever columns were involved in the unique key constraint violation.)

This is the simplest approach, as long as burning through AUTO_INCREMENT values isn't a concern.

I'm not limited to the INSERT ... ON DUPLICATE KEY statement, I can "roll my own" UPSERT function. BUT... I want to be cognizant of possible race conditions... if I perform a SELECT and then a subsequent INSERT, I leave a small window where another process can sneak in...

I could instead use a NOT EXISTS predicate to test for the existence of the row:

INSERT INTO target_table ( ...
SELECT ...
  FROM original_table o
 WHERE o.group_id = NEW.group_id
   AND NOT EXISTS (SELECT 1 FROM target_table d WHERE d.group_id = NEW.group_id)

Then I'd test whether a row was inserted (by checking number of affected rows), and if no row was inserted, then I could attempt an update. (I'm banking on the SELECT statement returning a single row.)

For better performance, I might use an anti-join pattern to do the same check (for existence of an existing row), but for one row, the NOT EXISTS (subquery) is fine, and I think it's easier to understand.

INSERT INTO target_table ( ...
SELECT ...
  FROM original_table o
  LEFT
  JOIN target_table t
    ON t.group_id = NEW.group_id
 WHERE o.group_id = NEW.group_id
   AND t.group_id IS NULL

(That SELECT from original-table might need to be wrapped as an inline view, since it's referencing the same table that's being inserted. Turning that query into a derived table should fix that, if its a problem.)


I said I "could" use that query from the view in my trigger. But that's not the approach I'd choose to use. It's not necessary. I don't really need to run a MAX(), MAX(), MAX() query to get every column.

I have all the values of the row being inserted into original_table , so I already know which element_label is being inserted, and there's really only one column that has to be changed in the target_table. (Do I want the MAX(element_value), or do I really just want the value that was just inserted?)

Here's the approach I would use in the trigger. I'd avoid running a query against the original_table at all, and just do the upsert on the one column in target_table:

IF NEW.element_label = 0 THEN
   -- name
   INSERT INTO target_table (group_id,       `name`) 
   VALUES (NEW.group_id, NEW.element_value)
   ON DUPLICATE KEY UPDATE                   `name` = VALUES(`name`);
ELSEIF NEW.element_label = 1 THEN
   -- street_address
   INSERT INTO target_table (group_id,       `street_address`) 
   VALUES (NEW.group_id, NEW.element_value)
   ON DUPLICATE KEY UPDATE                   `street_address` = VALUES(`street_address`);
ELSEIF NEW.element_label = 2 THEN
   -- street_address2
   INSERT INTO target_table (group_id,       `street_address2`) 
   VALUES (NEW.group_id, NEW.element_value)
   ON DUPLICATE KEY UPDATE                   `street_address2` = VALUES(`street_address2`);
ELSEIF NEW.element_label = 3 THEN
   -- city
   INSERT INTO target_table (group_id,       `city`) 
   VALUES (NEW.group_id, NEW.element_value)
   ON DUPLICATE KEY UPDATE                   `city` = VALUES(`city`);
ELSEIF NEW.element_label = 4 THEN
   ...
END

I know that's not very pretty, but I think it's the best approach if the maintenance of target_table has to be done at the time rows are inserted into original table. (The problem isn't really the database here, the problem is the EAV model, or really, the "impedance mismatch" between the EAV model (one row for each attribute value) and the relational model (one column in each row for each attribute value).

This isn't any uglier than the MAX(),MAX(),MAX() query.

I would also ditch the AUTO_INCREMENT id in the target table, and just use group_id (value from the original_table) as the primary key in my target_table, since I only want one row for each group_id.


UPDATE

You have to change the delimiter from semicolon to something else when the trigger body contains semicolons. Documentation here: http://dev.mysql.com/doc/refman/5.5/en/trigger-syntax.html

eg

DELIMITER $$

CREATE TRIGGER trg_original_table_ai
AFTER INSERT ON original_table
FOR EACH ROW
BEGIN
   IF NEW.element_label = 0 THEN
      -- name
      INSERT INTO target_table (group_id,       `name`) 
      VALUES (NEW.group_id, NEW.element_value)
      ON DUPLICATE KEY UPDATE                   `name` = VALUES(`name`);
   ELSEIF NEW.element_label = 1 THEN
      -- street_address
      INSERT INTO target_table (group_id,       `street_address`) 
      VALUES (NEW.group_id, NEW.element_value)
      ON DUPLICATE KEY UPDATE                   `street_address` = VALUES(`street_address`);
   END IF;
END$$

DELIMITER ;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM