避免插入那些已经在 SQL 表中的记录

Question

I am trying to insert pandas Data Frame into SQL using sqlalchemy.我正在尝试使用 sqlalchemy 将 pandas 数据帧插入 SQL。 The table is already existed in Database with three columns ID, Brand and Price.该表已经存在于数据库中，具有三列 ID、品牌和价格。 ID is identity column. ID 是标识列。 How I can I check before inserting each row from pandas data frame if the Brand is already existed or not.如果品牌已经存在，我如何在插入 pandas 数据框中的每一行之前检查。

    import pandas as pd

    cars = {'Brand': ['Honda Civic','Toyota Corolla','Ford Focus','Audi A4'],
            'Price': [22000,25000,27000,35000]
            }

    df = pd.DataFrame(cars, columns = ['Brand', 'Price'])

    from sqlalchemy import create_engine, MetaData, Table, select
    from six.moves import urllib

    params = urllib.parse.quote_plus("DRIVER={SQL Server};SERVER=server;DATABASE=mydb;UID=user;PWD=psw")
    engine = create_engine("mssql+pyodbc:///?odbc_connect=%s" % params) 
    engine.connect() 

    # suppose df is the data-frame that we want to insert in database
    df.to_sql(name='mytable',con=engine, index=False, if_exists='append')

    print("inserted)

Answer 1

You are really looking at a 30 year old relational database insert pattern.你真的在看一个有 30 年历史的关系数据库插入模式。 INSERT if it's not already in unique index (auto increment is not a meaningful unique key)如果它不在唯一索引中，则插入（自动增量不是有意义的唯一键）

I've used mariadb but approach is same across all DBMS.我使用过 mariadb，但所有 DBMS 的方法都是相同的。 Just stick to SQL92 standard.只要坚持 SQL92 标准。

name your temp table命名你的临时表
name the real table命名真实表
define what defines unique key定义什么定义了唯一键

table definition表定义

create table car (
    id double not null AUTO_INCREMENT,
    brand varchar(20) not null,
    price double,
    primary key (id, brand),
    unique key (brand)
)

python/sqlachemy to insert if it does not exist python/sqlachemy 如果不存在则插入

import pandas as pd
from sqlalchemy import create_engine, MetaData, Table, select

cars = {'Brand': ['Honda Civic','Toyota Corolla','Ford Focus','Audi A4'],
            'Price': [22000,25000,27000,35000]
            }

df = pd.DataFrame(cars, columns = ['Brand', 'Price'])

engine = create_engine('mysql+pymysql://sniffer:sniffer@127.0.0.1/sniffer')

temptable = "temp"
table = "car"
key = ["Brand"]
df.to_sql(name=temptable,con=engine, index=False, if_exists='append')
conn = engine.connect()
transfersql = f"""insert into {table} ({",".join(df.columns)}) 
                 select * from {temptable} t 
                 where not exists 
                   (select 1 from {table} m 
                   where {"and".join([f" t.{col} = m.{col} " for col in key])}
                   )"""
print(transfersql)
conn.execute(transfersql)
conn.execute(f"drop table {temptable}")
conn.close()

output (generated sql) output（生成的sql）

insert into car (Brand,Price) 
                 select * from temp t 
                 where not exists 
                   (select 1 from car m 
                   where  t.Brand = m.Brand 
                   )

避免插入那些已经在 SQL 表中的记录

问题描述

1 个解决方案

解决方案1
0 2020-07-30 08:52:50

避免插入那些已经在 SQL 表中的记录

问题描述

1 个解决方案

解决方案1 0 2020-07-30 08:52:50

解决方案1
0 2020-07-30 08:52:50