简体   繁体   English

如何使用 SQLAlchemy 将 Pandas DataFrame 更新到 MySQL

[英]How to upsert pandas DataFrame to MySQL with SQLAlchemy

I'm pushing data from a data-frame into MySQL, right now it is only adding new data to the table if the data does not exists(appending).我正在将数据从数据帧推送到 MySQL,现在它只会在数据不存在(追加)的情况下向表中添加新数据。 This works perfect, however I also want my code to check if the record already exists then it needs to update.这很完美,但是我也希望我的代码检查记录是否已经存在然后需要更新。 So I need it to append + update.所以我需要它来追加+更新。 I really don't know how to start fixing this as I got stuck....someone tried this before?我真的不知道如何开始解决这个问题,因为我被卡住了……以前有人试过这个吗?

This is my code:这是我的代码:

engine = create_engine("mysql+pymysql://{user}:{pw}@localhost/{db}"
                        .format(user="root",
                                pw="*****",
                                db="my_db"))
my_df.to_sql('my_table', con = engine, if_exists = 'append')

You can use next solution on DB side:您可以在 DB 端使用下一个解决方案:

First : create table for insert data from Pandas (let call it test):首先:创建用于从 Pandas 插入数据的表(我们称之为测试):

CREATE TABLE `test` (
  `id` INT(11) NOT NULL AUTO_INCREMENT,
  `name` VARCHAR(100) NOT NULL,
  `capacity` INT(11) NOT NULL,
  PRIMARY KEY (`id`)
);

Second : Create table for resulting data (let call it cumulative_test) exactly same structure as test:第二:为结果数据创建表(我们称之为cumulative_test)与测试完全相同的结构:

CREATE TABLE `cumulative_test` (
  `id` INT(11) NOT NULL AUTO_INCREMENT,
  `name` VARCHAR(100) NOT NULL,
  `capacity` INT(11) NOT NULL,
  PRIMARY KEY (`id`)
);

Third: set trigger on each insert into the test table will insert ore update record in the second table like:第三:在每次插入测试表时设置触发器将在第二个表中插入矿石更新记录,如:

DELIMITER $$

CREATE
    /*!50017 DEFINER = 'root'@'localhost' */
    TRIGGER `before_test_insert` BEFORE INSERT ON `test` 
    FOR EACH ROW BEGIN
    DECLARE _id INT;
    
    SELECT id INTO _id
    FROM `cumulative_test` WHERE `cumulative_test`.`name` = new.name;
    
    IF _id IS NOT NULL THEN
        UPDATE cumulative_test
        SET `cumulative_test`.`capacity` = `cumulative_test`.`capacity` + new.capacity;
     ELSE 
        INSERT INTO `cumulative_test` (`name`, `capacity`) 
        VALUES (NEW.name, NEW.capacity);
    END IF; 
END;
$$

DELIMITER ;

So you will already insert values into the test table and get calculated results in the second table.因此,您已经将值插入到测试表中并在第二个表中获得计算结果。 The logic inside the trigger can be matched for your needs.触发器内部的逻辑可以根据您的需要进行匹配。

Similar to the approach used for PostgreSQL here , you can use INSERT … ON DUPLICATE KEY in MySQL:类似于此处用于 PostgreSQL 的方法,您可以在 MySQL 中使用INSERT … ON DUPLICATE KEY

with engine.begin() as conn:
    # step 0.0 - create test environment
    conn.execute(sa.text("DROP TABLE IF EXISTS main_table"))
    conn.execute(
        sa.text(
            "CREATE TABLE main_table (id int primary key, txt varchar(50))"
        )
    )
    conn.execute(
        sa.text(
            "INSERT INTO main_table (id, txt) VALUES (1, 'row 1 old text')"
        )
    )
    # step 0.1 - create DataFrame to UPSERT
    df = pd.DataFrame(
        [(2, "new row 2 text"), (1, "row 1 new text")], columns=["id", "txt"]
    )

    # step 1 - create temporary table and upload DataFrame
    conn.execute(
        sa.text(
            "CREATE TEMPORARY TABLE temp_table (id int primary key, txt varchar(50))"
        )
    )
    df.to_sql("temp_table", conn, index=False, if_exists="append")

    # step 2 - merge temp_table into main_table
    conn.execute(
        sa.text(
            """\
            INSERT INTO main_table (id, txt) 
            SELECT id, txt FROM temp_table
            ON DUPLICATE KEY UPDATE txt = VALUES(txt)
            """
        )
    )

    # step 3 - confirm results
    result = conn.execute(
        sa.text("SELECT * FROM main_table ORDER BY id")
    ).fetchall()
    print(result)  # [(1, 'row 1 new text'), (2, 'new row 2 text')]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM