[英]Look for and transfer new data between two tables using a timestamp id
What the following script is supposed to do,下面的脚本应该做什么,
The code as written below works, but is unfinished.下面编写的代码有效,但未完成。 I am to the point where I have to compare the timestamp1 or t1 to the id column in the staging table.我到了必须将 timestamp1 或 t1 与登台表中的 id 列进行比较的地步。 I'm unsure of how to go about that though.不过,我不确定如何 go。
This spot in the code,代码中的这个位置,
#insert new entries into final db table
cursor.execute("INSERT INTO test SELECT * FROM stagingtable WHERE ####
I am hoping for a bit of assistance or guidance with what needs to be done.我希望在需要做的事情上得到一些帮助或指导。 My python skills are new and it has taken me a good deal to get this far in. I'm sure a for loop is required, but I'm not sure how to incorporate the timetable format to the id column "%Y/%m/%d %H:%M:%S.%f".我的 python 技能是新技能,我花了很多功夫才能走到这一步。我确定需要 for 循环,但我不确定如何将时间表格式合并到 id 列 "%Y/% m/%d %H:%M:%S.%f”。 When applied correctly, the difference for new entries to the timestamp id should be positive and entries that already exist, either zero or negative.正确应用时,时间戳 id 的新条目的差异应该是正数,而已经存在的条目应该是零或负数。 Some may suggest that a Merge Into would work, but at the moment the final table will continually collect data without truncating any earlier uploads.有些人可能会建议 Merge Into 可行,但目前最终表将不断收集数据,而不会截断任何较早的上传。 So it'll eventually take longer and longer to compare data using the Merge method (to my understanding).因此,使用 Merge 方法比较数据最终会花费越来越长的时间(据我所知)。
import csv
import pyodbc
import time
from datetime import datetime
#connect to database
#DB connection string
print("Establishing Database connection...")
con = pyodbc.connect('DSN=sqldatabase')
cursor = con.cursor()
print("...Connected to database.")
#recall last timestamp entry in final db table
timestamp1 = cursor.execute("select max(id) from test;").fetchval()
#read file and copy data into staging table
print("Reading file contents and copying into staging table...")
with open('C:\\Users\\user\\Desktop\\test2.csv') as csvfile:
readCSV = csv.reader(csvfile, delimiter=',')
columns = next(readCSV) #skips the header row
query = 'insert into stagingtable({0}) values ({1})'
query = query.format(','.join(columns), ','.join('?' * len(columns)))
for data in readCSV:
cursor.execute(query, data)
con.commit()
timestamp2 = cursor.execute("select max(id) from stagingtable;").fetchval()
t1 = datetime.strptime(timestamp1, "%Y/%m/%d %H:%M:%S.%f")
# t2 = datetime.strptime(timestamp2, "%Y/%m/%d %H:%M:%S.%f")
# difference = t2 - t1
# print(difference)
#insert new entries into final db table
cursor.execute("INSERT INTO test SELECT * FROM stagingtable WHERE ####
#clear staging table
print("Clearing previous data download...")
cursor.execute("TRUNCATE TABLE stagingtable")
con.commit()
con.close()
print("...Completed clearing staging table.")
import csv
import pyodbc
import time
from datetime import datetime
#connect to database
#DB connection string
print("Establishing Database connection...")
con = pyodbc.connect('DSN=SQLdatabase')
cursor = con.cursor()
print("...Connected to database.")
#recall last timestamp entry in db table
t1 = datetime.strptime(cursor.execute("SELECT MAX(id) FROM test;").fetchval(), "%Y/%m/%d %H:%M:%S.%f")
#read file and copy data into table
print("Reading file contents and copying into table...")
with open('C:\\Users\\user\\Desktop\\test2.csv') as csvfile:
readCSV = csv.reader(csvfile, delimiter=',')
columns = next(readCSV) #skips the header row
t2 = datetime.strptime(next(readCSV)[0], "%Y/%m/%d %H:%M:%S.%f")
while t2 < t1:
t2 = datetime.strptime(next(readCSV)[0], "%Y/%m/%d %H:%M:%S.%f")
query = 'insert into test({0}) values ({1})'
query = query.format(','.join(columns), ','.join('?' * len(columns)))
for data in readCSV:
cursor.execute(query, data)
con.commit()
print("Data posted to table")
I did away with the staging table.我取消了临时表。 This was the final outcome.这是最后的结果。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.