[英]Python: importing data from large csv to the sqlite DB
I was trying for importing data from the csv file to the sqlite db using python script. 我试图使用python脚本将数据从csv文件导入到sqlite数据库。
My CSV rows are as follows: (in an excel sheet downloaded from quandle) 我的CSV行如下:(在从quandle下载的excel表中)
Date Open High Low Last Close TotTrQt Turnover (Lacs)
2017-05-26 2625 2626.85 2564.65 2570.05 2578.25 681275 17665.43
2017-05-25 2577 2637.55 2568 2615.05 2624.6 2047047 53333.77
2017-05-24 2534.8 2570 2529.65 2567.1 2559.15 1267274 32252.28
2017-05-23 2533.2 2564.15 2514 2523.7 2521.7 1374298 34776.45
2017-05-22 2510 2553.75 2510 2535 2531.35 831970 21054.61
2017-05-19 2536.2 2540.55 2486 2503.85 2507.15 893022 22384.3
2017-05-18 2450 2572 2442.25 2525 2536.2 2569297 64894.78
2017-05-17 2433.5 2460.75 2423 2450 2455.35 1438099 35137.29
2017-05-16 2380 2435 2373.45 2425.1 2429.15 1800513 43397.03
2017-05-15 2375.1 2377.95 2341.6 2368 2365.1 908802 21380.43
For creating the DB table, I have used the following script: 为了创建数据库表,我使用了以下脚本:
import sqlite3
try:
db = sqlite3.connect('NSETCS')
cursor=db.cursor()
print 'Executing: Create Table SQL'
cursor.execute('''CREATE TABLE NSETCS (DATE TEXT, OPEN REAL, HIGH REAL, LOW REAL, LAST REAL, CLOSE REAL,\
TOTALTRADEQUANTITY REAL, TURNOVER REAL)''')
##since above statment is DDL, no explicit commit is reqd
except Exception as E:
print "Error=",E
finally:
db.close()
For Inserting the data in a particular rows of the table, I am using the following script, however data insertion is getting failed as float conversion is giving error, any guidance would highly be appreciated. 对于在表的特定行中插入数据,我使用以下脚本,但是由于浮点转换给出错误,数据插入失败,任何指导都将非常受欢迎。
import sqlite3
try:
infile = open (r'F:\mypractise_python\day11\NSE-TCS.csv','r')
content = infile.readlines()
except IOError as E:
print "Error: ", E
try:
db = sqlite3.connect('NSETCS')
cursor = db.cursor()
for line in content:
line =line.strip()
columns = line.split(',')
if line == '' or columns[0] == 'Date':
continue
date = columns[0].strip()
open_stock = float(columns[1].strip())
high = float(columns[2].strip())
low = float(columns[3].strip())
last= float(columns[4].strip())
close= float(columns[5].strip())
tot_trade_qt= float(columns[6].strip())
turnover= float(columns[7].strip())
cursor.execute('''insert into NSETCS values (:date, :open_stock, :high, :low, :last, :close, :tot_trade_qt, :turnover)''',\
{'date':date, 'open_stock':open_stock, 'high':high, 'low':low, 'last':last, 'close':close,\
'tot_trade_qt':tot_trade_qt, 'turnover':turnover})
except Exception as E:
print "Error:", E
else:
db.commit()
db.close()
infile.close()
The csv sample doesn't match the code - in the latter you're skipping a row (presumably the header), the first cell of which is 'DATE' - and in the csv it is 'Date'. csv样本与代码不匹配 - 在后者中你跳过一行(可能是标题),第一个单元格是'DATE' - 而在csv中它是'Date'。
Consider using csv.DictReader - it'll produce for each source row a dict, where a column is a key: 考虑使用csv.DictReader - 它将为每个源行生成一个dict,其中一列是一个键:
import sqlite3
import csv
db = sqlite3.connect('NSETCS')
with open (r'F:\mypractise_python\day11\NSE-TCS.csv','r') as infile, db:
content = csv.DictReader(infile, delimiter=',') # csv generator to the file, will be read line by line
cursor = db.cursor()
for line in content:
# line is a dict, where each column name is the key
# no need to sanitize the header row, that was done automatically upon reading the file
date = line['Date']
open_stock = float(line['Open'])
high = float(line['High'])
low = float(line['Low'])
last= float(line['Last'])
close= float(line['Close'])
tot_trade_qt= float(line['TotTrQt'])
turnover= float(line['Turnover (Lacs)'])
cursor.execute('''insert into NSETCS values (:date, :open_stock, :high, :low, :last, :close, :tot_trade_qt, :turnover)''',\
{'date':date, 'open_stock':open_stock, 'high':high, 'low':low, 'last':last, 'close':close,\
'tot_trade_qt':tot_trade_qt, 'turnover':turnover})
# no file closing, that is automatically done by 'with open()'
# no db close or commit - the DB connection is in a context manger, that'll be done automatically (the commit - if there are no exceptions)
My main point is - do not parse the raw csv manually, but use the builtin library for that - it will save you a lot of trouble. 我的主要观点是 - 不要手动解析原始csv,而是使用内置库 - 这将为您节省很多麻烦。
For example, you try to handle the empty lines, skipping the header row; 例如,您尝试处理空行,跳过标题行; but there are other cases in csv - data quoting and escaping for example, which the library will handle for you. 但是在csv中还有其他情况 - 例如数据引用和转义,库将为您处理。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.