I need to update the row with the CSV filename and the time data was inserted into the database. I can use the below code to insert data from CSV into the database:
with open('my.csv', 'r') as f:
next(f)
cur.copy_from(f, 'csv_import', sep=',')
but for my requirement along with the csv data, there are 2 more columns which needs to be updated.
how can we achieve this?
The timestamp values can be done with triggers and teh CSV file would need to be updated from the operational system,if it is linux could be added to the execution plan batch file or cronschedule. Trigger:
CREATE OR REPLACE FUNCTION csvvalues()
RETURNS TRIGGER AS
$body$
BEGIN
NEW.timestamp = NOW;
RETURN NEW;
END
$body$
LANGUAGE plpgsql ;
CREATE TRIGGER csvvalues BEFORE ISNERT ON csv_import
FOR EACH ROW EXECUTE FUNCTION csvvalues();
and command line to be executed on the file location:
pqsl -U postgres -h 192.168.1.100 -p 5432 -d database -c "with open('my.csv', 'r') as f: next(f) cur.copy_from(f, 'csv_import', sep=',')"
password
mv file1.csvfile2.csv
if it is out from the server you can install the psql of current postgresql version and run the command line from the operation system being used
Setup:
cat test.csv
1,john123@gmail.com,"John Stokes"
2,emily123@gmail.com,"Emily Ray"
create table csv_add(id integer, mail varchar, name varchar, file_name varchar, ts_added timestamptz);
Code to add file name and timestamp:
import csv
from io import StringIO
import psycopg2
con = psycopg2.connect(dbname="test", host='localhost', user='postgres', port=5432)
cur = con.cursor()
with open('test.csv') as f:
reader = csv.reader(f)
file_name = f.name
out = StringIO()
for row in reader:
out.write(','.join(row + [file_name, datetime.now().isoformat()])+'\n')
out.seek(0)
cur.copy_from(out, 'csv_add', ',')
con.commit()
select * from csv_add ;
id | mail | name | file_name | ts_added
----+--------------------+-------------+-----------+--------------------------------
1 | john123@gmail.com | John Stokes | test.csv | 12/18/2022 09:58:20.475244 PST
2 | emily123@gmail.com | Emily Ray | test.csv | 12/18/2022 09:58:20.475256 PST
Sorry, gonna stick to pseudo-code (hopefully got the right raw sql syntax, don't usually write sql raw).
What I have often seen done is to use 2 tables, one a staging, import table, one the real table.
So...
with open('my.csv', 'r') as f:
next(f)
cur.copy_from(f, 'csv_import', sep=',')
cur.execute("""
insert into actual (f1, f2, ts, filename)
select f1, f2,%(ts)s,%(filename)s
from csv_import""",
dict(ts = time(), filename="my.csv")
)
cur.execute("delete from csv_import")
Don't try to get fancy with *
on the right side to skip the field list - that depends on column order in db, which may change with Alter Tables later. Been there, done that - picking up after someone did just that.
This supports higher volumes easily and does not clutter your db with triggers. You can also reuse the staging table to get feeds from elsewhere, say an email gateway.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.