I'm writing a python script that accesses my postgres database. In addition, I'm importing a .csv file to take the data and transfer it to the postgres database.
I have something like this and I apologize if I can't seem to format correctly as I'm new here.
Table foo:
A SERIAL NOT NULL PRIMARY KEY
B
C
Table bar:
D INT NOT NULL PRIMARY KEY
E
F INT REFERENCES foo(A)
My code looks something like this:
csvfile = open('something.csv')
conn = psycopg2.connect("dbname=something user=something password=something")
cur = conn.cursor()
datareader = csv.DictReader(csvfile, delimiter=',')
for row in datareader:
cur.execute("INSERT INTO bar (D, E, F) VALUES (%s, %s, %s)", (row['some_int'], row['some_char']), [not_sure_what_goes_here])
I do not know how to pull the serial value from table foo
.
Basically (but keep reading!), you want to INSERT
a row into foo
first and use the automatically generated A
for the next INSERT
into bar
.
You could do that with two round trips to the DB, but that would be unreasonably expensive and much more susceptible to concurrency issues. You want to get this done in a single trip to the DB .
In Postgres 9.1+ you can use a data-modifying CTE to execute two inserts in one statement:
cur.execute("""WITH ins AS (
INSERT INTO foo (B, C)
VALUES (%s, %s)
RETURNING A
)
INSERT INTO bar (D, E, F)
SELECT %s, %s, A
FROM ins""", (val_for_b, val_for_c, val_for_d, val_for_e)
However , the above code implies that you can insert (B, C)
into foo
unconditionally. That's rarely the case. Typically you have some kind of unique constraint on foo
and want to insert (B, C)
only if it's not there, yet. You have a " SELECT-or-INSERT " problem, which is further complicated by possible race conditions with concurrent write operations (if possible). The best solution depends on your exact requirements.
To solve the "SELECT-or-INSERT" problem and also minimize chances for a race condition:
cur.execute("""WITH param(_B, _C) AS (SELECT %s::text, %s::text)
, sel AS (SELECT A FROM foo, param WHERE B = _B AND C = _C FOR SHARE)
, ins AS (
INSERT INTO foo (B, C)
SELECT _B, _C FROM param
WHERE NOT EXISTS (SELECT 1 FROM sel)
RETURNING A
)
INSERT INTO bar (D, E, F)
SELECT %s, %s, COALESCE(sel.A, ins.A)
FROM sel FULL JOIN ins""", (val_for_b, val_for_c, val_for_d, val_for_e)
Replace the type cast ::text
with your actual data types (that you did not provide).
This should be good enough for most use cases. It still leaves a tiny chance for a race condition, when two transaction try to insert the same row in foo
concurrently.
To be perfectly sure, consider a server-side function like discussed in this related answer. You'll find detailed explanation for the above SQL code as well:
Then your call can be:
cur.execute("""INSERT INTO bar (D, E, F)
VALUES(%s, %s, f_foo_id(%s, %s))""", (val_for_d, val_for_e, val_for_b, val_for_c)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.