简体   繁体   中英

Python3, using Pandas function df.to_sql() with SQLAlchemy to insert dataframe into PostgreSQL database

How do I specify the primary key of the table? If doing this with standard SQL I would use:

    age SMALLINT    NOT NULL,\
    gen TEXT        NOT NULL,\
    eth TEXT        NOT NULL,\
    biu BOOLEAN     NOT NULL,\
    pst TEXT        NOT NULL,\
    jse TEXT        NOT NULL,\
    emp TEXT        NOT NULL,\
    inc INT         NOT NULL,\
    own BOOLEAN     NOT NULL,\
    edu INT         NOT NULL,\
    res TEXT        NOT NULL);

And then fill the table using INSERT queries. Pandas function df.to_sql() does not seem to allow me to define the primary key. I can specify a dict 'dtype' to specify the datatype for each column, but I can't find a way to specify the primary key, or to specify 'NOT NULL'

I have a bit of experience with SQLite with Python, but I'm totally new to PostgreSQL.

I think pandas is not allowing you to specify your PRIMARY KEY and other sql constraints when using to_sql(). A common solution seems to be creating your table using sqlalquemy or psycopg2, such as

import psycopg2
conn_string = "host=" + your_hostname + " port=" + your_port + " dbname=" + your_database + " user=" + your_uid + " password=" + your_pwd
conn = psycopg2.connect(conn_string)
cur = con.cursor()

cur.execute("""CREATE TABLE table (
               uid TEXT PRIMARY KEY NOT NULL,
               age SMALLINT    NOT NULL,
               gen TEXT        NOT NULL,
               eth TEXT        NOT NULL,
               and so on...);"""

Then insert data by an "INSERT INTO" query, or if you want to insert data directly by a pandas.to_sql() command, then you need to pass a connection created by sqlalchemy instead.

I hope this helps.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM