简体   繁体   中英

Python: importing data from large csv to the sqlite DB

I was trying for importing data from the csv file to the sqlite db using python script.

My CSV rows are as follows: (in an excel sheet downloaded from quandle)

Date       Open    High      Low    Last    Close   TotTrQt Turnover (Lacs)
2017-05-26  2625    2626.85 2564.65 2570.05 2578.25 681275  17665.43
2017-05-25  2577    2637.55 2568    2615.05 2624.6  2047047 53333.77
2017-05-24  2534.8  2570    2529.65 2567.1  2559.15 1267274 32252.28
2017-05-23  2533.2  2564.15 2514    2523.7  2521.7  1374298 34776.45
2017-05-22  2510    2553.75 2510    2535    2531.35 831970  21054.61
2017-05-19  2536.2  2540.55 2486    2503.85 2507.15 893022  22384.3
2017-05-18  2450    2572    2442.25 2525    2536.2  2569297 64894.78
2017-05-17  2433.5  2460.75 2423    2450    2455.35 1438099 35137.29
2017-05-16  2380    2435    2373.45 2425.1  2429.15 1800513 43397.03
2017-05-15  2375.1  2377.95 2341.6  2368    2365.1  908802  21380.43

For creating the DB table, I have used the following script:

import sqlite3

try:
  db = sqlite3.connect('NSETCS')
  cursor=db.cursor()
  print 'Executing: Create Table SQL'
  cursor.execute('''CREATE TABLE NSETCS (DATE TEXT, OPEN REAL, HIGH REAL, LOW REAL, LAST REAL, CLOSE REAL,\
   TOTALTRADEQUANTITY REAL, TURNOVER REAL)''')
  ##since above statment is DDL, no explicit commit is reqd
except Exception as E:
    print "Error=",E
finally:
    db.close()

For Inserting the data in a particular rows of the table, I am using the following script, however data insertion is getting failed as float conversion is giving error, any guidance would highly be appreciated.

import sqlite3

try:
    infile = open (r'F:\mypractise_python\day11\NSE-TCS.csv','r')
    content = infile.readlines()
except IOError as E:
    print "Error: ", E
try:
    db = sqlite3.connect('NSETCS')
    cursor = db.cursor()

    for line in content:
        line =line.strip()
        columns = line.split(',')
        if line == '' or columns[0] == 'Date':
            continue
        date = columns[0].strip()
        open_stock = float(columns[1].strip())
        high = float(columns[2].strip())
        low = float(columns[3].strip())
        last= float(columns[4].strip())
        close= float(columns[5].strip())
        tot_trade_qt= float(columns[6].strip())
        turnover= float(columns[7].strip())
        cursor.execute('''insert into  NSETCS values (:date, :open_stock, :high, :low, :last, :close, :tot_trade_qt, :turnover)''',\
                       {'date':date, 'open_stock':open_stock, 'high':high, 'low':low, 'last':last, 'close':close,\
                        'tot_trade_qt':tot_trade_qt, 'turnover':turnover})

except Exception as E:
    print "Error:", E
else:
    db.commit()

db.close()
infile.close()

The csv sample doesn't match the code - in the latter you're skipping a row (presumably the header), the first cell of which is 'DATE' - and in the csv it is 'Date'.

Consider using csv.DictReader - it'll produce for each source row a dict, where a column is a key:

import sqlite3
import csv

db = sqlite3.connect('NSETCS')

with open (r'F:\mypractise_python\day11\NSE-TCS.csv','r') as infile, db:
    content = csv.DictReader(infile, delimiter=',')  # csv generator to the file, will be read line by line

    cursor = db.cursor()

    for line in content:
        # line is a dict, where each column name is the key
        # no need to sanitize the header row, that was done automatically upon reading the file
        date = line['Date']
        open_stock = float(line['Open'])
        high = float(line['High'])
        low = float(line['Low'])
        last= float(line['Last'])
        close= float(line['Close'])
        tot_trade_qt= float(line['TotTrQt'])
        turnover= float(line['Turnover (Lacs)'])
        cursor.execute('''insert into  NSETCS values (:date, :open_stock, :high, :low, :last, :close, :tot_trade_qt, :turnover)''',\
                   {'date':date, 'open_stock':open_stock, 'high':high, 'low':low, 'last':last, 'close':close,\

             'tot_trade_qt':tot_trade_qt, 'turnover':turnover})

# no file closing, that is automatically done by 'with open()'
# no db close or commit - the DB connection is in a context manger, that'll be done automatically (the commit - if there are no exceptions)

My main point is - do not parse the raw csv manually, but use the builtin library for that - it will save you a lot of trouble.

For example, you try to handle the empty lines, skipping the header row; but there are other cases in csv - data quoting and escaping for example, which the library will handle for you.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM