简体   繁体   中英

Python Read a CSV and place values in MySQL Database

I am trying to get values from a csv and put them into the database, I am managing to do this without a great deal of trouble.

But I know need to write back to the csv so on the next time I run the script it will only enter the values into the DB from bellow the mark in the csv file.

Note the CSV file on the system automatically flushes every 24hrs so bear in mind there might not be a mark in the csv. So basically put all values in the database if no mark is found.

I am planning to run this script every 30mins so hence there could be 48 marks in the csv file or even remove the mark and move it down the file each time?

I have been deleting the file and then re making a file in the script so new file every script run but this breaks the system somehow so that is not a great option.

Hope You Guys can help..

Thank You

Python Code:

import csv
import MySQLdb

mydb = MySQLdb.connect(host='localhost',
user='root',
passwd='******',
db='kestrel_keep')

cursor = mydb.cursor()

csv_data = csv.reader(file('data_csv.log'))

for row in csv_data:

    cursor.execute('INSERT INTO `heating` VALUES ( %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s,)',
   row)
#close the connection to the database.
mydb.commit()
cursor.close()
import os


print "Done"

My CSV file Format:

2013-02-21,21:42:00,-1.0,45.8,27.6,17.3,14.1,22.3,21.1,1,1,2,2
2013-02-21,21:48:00,-1.0,45.8,27.5,17.3,13.9,22.3,20.9,1,1,2,2

It looks like the first field in your MySQL table is a unique timestamp. It is possible to set up the MySQL table so that the field must be unique, and to ignore INSERT s that would violate that uniqueness property. At a mysql> prompt enter the command:

ALTER IGNORE TABLE heating ADD UNIQUE heatingidx (thedate, thetime)    

(Change thedate and thetime to the names of the columns holding the date and time.)


Once you make this change to your database, you only need to change one line in your program to make MySQL ignore duplicate insertions:

cursor.execute('INSERT IGNORE INTO `heating` VALUES ( %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s,)', row)

Yes, it is a little wasteful to be running INSERT IGNORE ... on lines that have already been processed, but given the frequency of your data (every 6 minutes?), it is not going to matter much in terms of performance.

The advantage to doing it this way is that it is now impossible to accidentally insert duplicates into your table. It also keeps the logic of your program simple and easy to read.

It also avoids having two programs write to the same CSV file at the same time. Even if your program usually succeeds without error, every so often -- maybe once in a blue moon -- your program and the other program may try to write to the file at the same time, which could result in an error or mangled data.


You can also make your program a little faster by using cursor.executemany instead of cursor.execute :

rows = list(csv_data)
cursor.executemany('''INSERT IGNORE INTO `heating` VALUES
    ( %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s,)''', rows)

is equivalent to

for row in csv_data:    
    cursor.execute('INSERT INTO `heating` VALUES ( %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s,)',
   row)

except that it packs all the data into one command.

I think that a better option than "marking" the CSV file is to keep a file were you store the number of the last line you processed.

So if the file does not exist (the one were you store the number of the last processed line), you process the whole CSV file. If this file exists you only process records after this line.

Final Code On Working System:

#!/usr/bin/python
import csv
import MySQLdb
import os

mydb = MySQLdb.connect(host='localhost',
user='root',
passwd='*******',
db='kestrel_keep')

cursor = mydb.cursor()

csv_data = csv.reader(file('data_csv.log'))

start_row = 0

def getSize(fileobject):
fileobject.seek(0,2) # move the cursor to the end of the file
size = fileobject.tell()
return size

file = open('data_csv.log', 'rb')
curr_file_size = getSize(file)

# Get the last file Size
if os.path.exists("file_size"):
with open("file_size") as f:
    saved_file_size = int(f.read())


# Get the last processed line
if os.path.exists("lastline"):
with open("lastline") as f:
    start_row = int(f.read())


if curr_file_size < saved_file_size: start_row = 0

cur_row = 0
for row in csv_data:
 if cur_row >= start_row:
     cursor.execute('INSERT INTO `heating` VALUES ( %s, %s, %s, %s, %s, %s, %s, %s, %s,    %s, %s, %s, %s, %s, %s, %s ,%s)', row)

     # Other processing if necessary

 cur_row += 1

 mydb.commit()
 cursor.close()


# Store the last processed line
with open("lastline", 'w') as f:
start_line = f.write(str(cur_row + 1)) # you want to start at the **next** line
                                      # next time
# Store Current  File Size To Find File Flush    
with open("file_size", 'w') as f:
start_line = f.write(str(curr_file_size))

# not necessary but good for debug
print (str(cur_row))



 print "Done"

Edit: Final Code Submited by ZeroG and now working on the system!! Thank You also Too Xion345 For Helping

Each csv row seems to contain a timestamp. If these are always increasing, you could query the db for the maximum timestamp already recorded, and skip all rows before that time when reading the csv.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM