简体   繁体   English

使用时间戳 id 在两个表之间查找和传输新数据

[英]Look for and transfer new data between two tables using a timestamp id

What the following script is supposed to do,下面的脚本应该做什么,

  1. Connect to postgreSQL database连接postgreSQL数据库
  2. Grab last id entry in the database final table获取数据库最终表中的最后一个 id 条目
  3. Compare that entry to data uploaded into a staging table from a.csv using the [id] column (trying to avoid duplicates)使用 [id] 列将该条目与从 a.csv 上传到暂存表的数据进行比较(尽量避免重复)
  4. Insert data from the staging table to the final table (only entries where timestamp id is greater than last entry from previous data)将临时表中的数据插入最终表(仅时间戳 id 大于先前数据的最后一个条目的条目)
  5. Truncate staging table截断暂存表

The code as written below works, but is unfinished.下面编写的代码有效,但未完成。 I am to the point where I have to compare the timestamp1 or t1 to the id column in the staging table.我到了必须将 timestamp1 或 t1 与登台表中的 id 列进行比较的地步。 I'm unsure of how to go about that though.不过,我不确定如何 go。

This spot in the code,代码中的这个位置,

#insert new entries into final db table
cursor.execute("INSERT INTO test SELECT * FROM stagingtable WHERE ####

I am hoping for a bit of assistance or guidance with what needs to be done.我希望在需要做的事情上得到一些帮助或指导。 My python skills are new and it has taken me a good deal to get this far in. I'm sure a for loop is required, but I'm not sure how to incorporate the timetable format to the id column "%Y/%m/%d %H:%M:%S.%f".我的 python 技能是新技能,我花了很多功夫才能走到这一步。我确定需要 for 循环,但我不确定如何将时间表格式合并到 id 列 "%Y/% m/%d %H:%M:%S.%f”。 When applied correctly, the difference for new entries to the timestamp id should be positive and entries that already exist, either zero or negative.正确应用时,时间戳 id 的新条目的差异应该是正数,而已经存在的条目应该是零或负数。 Some may suggest that a Merge Into would work, but at the moment the final table will continually collect data without truncating any earlier uploads.有些人可能会建议 Merge Into 可行,但目前最终表将不断收集数据,而不会截断任何较早的上传。 So it'll eventually take longer and longer to compare data using the Merge method (to my understanding).因此,使用 Merge 方法比较数据最终会花费越来越长的时间(据我所知)。

import csv
import pyodbc
import time
from datetime import datetime

#connect to database
#DB connection string
print("Establishing Database connection...")
con = pyodbc.connect('DSN=sqldatabase')
cursor = con.cursor()
print("...Connected to database.")

#recall last timestamp entry in final db table
timestamp1 = cursor.execute("select max(id) from test;").fetchval()

#read file and copy data into staging table
print("Reading file contents and copying into staging table...")
with open('C:\\Users\\user\\Desktop\\test2.csv') as csvfile:
    readCSV = csv.reader(csvfile, delimiter=',')
    columns = next(readCSV) #skips the header row
    query = 'insert into stagingtable({0}) values ({1})'
    query = query.format(','.join(columns), ','.join('?' * len(columns)))
    for data in readCSV:
        cursor.execute(query, data)
    con.commit()

    timestamp2 = cursor.execute("select max(id) from stagingtable;").fetchval()
    t1 = datetime.strptime(timestamp1, "%Y/%m/%d %H:%M:%S.%f")
#    t2 = datetime.strptime(timestamp2, "%Y/%m/%d %H:%M:%S.%f")
#    difference = t2 - t1
#    print(difference)

#insert new entries into final db table
cursor.execute("INSERT INTO test SELECT * FROM stagingtable WHERE ####

#clear staging table
print("Clearing previous data download...")
cursor.execute("TRUNCATE TABLE stagingtable")
con.commit()
con.close()
print("...Completed clearing staging table.")
import csv
import pyodbc
import time
from datetime import datetime

#connect to database
#DB connection string
print("Establishing Database connection...")
con = pyodbc.connect('DSN=SQLdatabase')
cursor = con.cursor()
print("...Connected to database.")

#recall last timestamp entry in db table

t1 = datetime.strptime(cursor.execute("SELECT MAX(id) FROM test;").fetchval(), "%Y/%m/%d %H:%M:%S.%f")


#read file and copy data into table
print("Reading file contents and copying into table...")
with open('C:\\Users\\user\\Desktop\\test2.csv') as csvfile:
    readCSV = csv.reader(csvfile, delimiter=',')
    columns = next(readCSV) #skips the header row
    t2 = datetime.strptime(next(readCSV)[0], "%Y/%m/%d %H:%M:%S.%f")
    while t2 < t1:
        t2 = datetime.strptime(next(readCSV)[0], "%Y/%m/%d %H:%M:%S.%f")
    query = 'insert into test({0}) values ({1})'
    query = query.format(','.join(columns), ','.join('?' * len(columns)))
    for data in readCSV:
        cursor.execute(query, data)
    con.commit()
print("Data posted to table")

I did away with the staging table.我取消了临时表。 This was the final outcome.这是最后的结果。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM