简体   繁体   English

Python读取CSV并将值放置在MySQL数据库中

[英]Python Read a CSV and place values in MySQL Database

I am trying to get values from a csv and put them into the database, I am managing to do this without a great deal of trouble. 我正在尝试从csv获取值并将其放入数据库,而我正在设法做到这一点而没有太多麻烦。

But I know need to write back to the csv so on the next time I run the script it will only enter the values into the DB from bellow the mark in the csv file. 但是我知道需要写回csv,因此在下次运行脚本时,它将仅从csv文件中的标记下方将值输入到DB中。

Note the CSV file on the system automatically flushes every 24hrs so bear in mind there might not be a mark in the csv. 请注意,系统上的CSV文件每24小时自动刷新一次,因此请记住,csv中可能没有标记。 So basically put all values in the database if no mark is found. 因此,如果找不到标记,则基本上将所有值放入数据库中。

I am planning to run this script every 30mins so hence there could be 48 marks in the csv file or even remove the mark and move it down the file each time? 我计划每30分钟运行一次此脚本,因此csv文件中可能有48个标记,甚至每次都删除该标记并将其向下移动?

I have been deleting the file and then re making a file in the script so new file every script run but this breaks the system somehow so that is not a great option. 我一直在删除文件,然后在脚本中重新创建文件,因此每个脚本都会运行新文件,但这会以某种方式破坏系统,所以这不是一个好选择。

Hope You Guys can help.. 希望你们能帮忙..

Thank You 谢谢

Python Code: Python代码:

import csv
import MySQLdb

mydb = MySQLdb.connect(host='localhost',
user='root',
passwd='******',
db='kestrel_keep')

cursor = mydb.cursor()

csv_data = csv.reader(file('data_csv.log'))

for row in csv_data:

    cursor.execute('INSERT INTO `heating` VALUES ( %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s,)',
   row)
#close the connection to the database.
mydb.commit()
cursor.close()
import os


print "Done"

My CSV file Format: 我的CSV文件格式:

2013-02-21,21:42:00,-1.0,45.8,27.6,17.3,14.1,22.3,21.1,1,1,2,2
2013-02-21,21:48:00,-1.0,45.8,27.5,17.3,13.9,22.3,20.9,1,1,2,2

It looks like the first field in your MySQL table is a unique timestamp. 看起来MySQL表中的第一个字段是唯一的时间戳。 It is possible to set up the MySQL table so that the field must be unique, and to ignore INSERT s that would violate that uniqueness property. 可以设置MySQL表,使该字段必须唯一,并忽略可能违反该uniqueness属性的INSERT At a mysql> prompt enter the command: mysql>提示符下,输入命令:

ALTER IGNORE TABLE heating ADD UNIQUE heatingidx (thedate, thetime)    

(Change thedate and thetime to the names of the columns holding the date and time.) (更改thedatethetime持有的日期和时间列的名字。)


Once you make this change to your database, you only need to change one line in your program to make MySQL ignore duplicate insertions: 对数据库进行此更改后,只需更改程序中的一行即可使MySQL忽略重复的插入:

cursor.execute('INSERT IGNORE INTO `heating` VALUES ( %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s,)', row)

Yes, it is a little wasteful to be running INSERT IGNORE ... on lines that have already been processed, but given the frequency of your data (every 6 minutes?), it is not going to matter much in terms of performance. 是的,在已经处理过的行上运行INSERT IGNORE ...有点浪费,但是考虑到数据的频率(每6分钟一次?),就性能而言并没有多大关系。

The advantage to doing it this way is that it is now impossible to accidentally insert duplicates into your table. 这样做的好处是现在不可能将重复项意外插入到表中。 It also keeps the logic of your program simple and easy to read. 它也使程序的逻辑简单易读。

It also avoids having two programs write to the same CSV file at the same time. 它还避免了两个程序同时写入相同的CSV文件。 Even if your program usually succeeds without error, every so often -- maybe once in a blue moon -- your program and the other program may try to write to the file at the same time, which could result in an error or mangled data. 即使您的程序通常成功执行而没有错误,但也经常-也许是一次成功-您的程序和另一个程序可能试图同时写入文件,这可能会导致错误或数据损坏。


You can also make your program a little faster by using cursor.executemany instead of cursor.execute : 您还可以通过使用cursor.executemany而不是cursor.execute使程序变快:

rows = list(csv_data)
cursor.executemany('''INSERT IGNORE INTO `heating` VALUES
    ( %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s,)''', rows)

is equivalent to 相当于

for row in csv_data:    
    cursor.execute('INSERT INTO `heating` VALUES ( %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s,)',
   row)

except that it packs all the data into one command. 除了它将所有数据打包到一个命令中。

I think that a better option than "marking" the CSV file is to keep a file were you store the number of the last line you processed. 我认为,比起“标记” CSV文件,更好的选择是在存储最后处理的行号时保留文件。

So if the file does not exist (the one were you store the number of the last processed line), you process the whole CSV file. 因此,如果该文件不存在(存储最后处理的行号),则将处理整个CSV文件。 If this file exists you only process records after this line. 如果该文件存在,则仅在此行之后处理记录。

Final Code On Working System: 关于工作系统的最终代码:

#!/usr/bin/python
import csv
import MySQLdb
import os

mydb = MySQLdb.connect(host='localhost',
user='root',
passwd='*******',
db='kestrel_keep')

cursor = mydb.cursor()

csv_data = csv.reader(file('data_csv.log'))

start_row = 0

def getSize(fileobject):
fileobject.seek(0,2) # move the cursor to the end of the file
size = fileobject.tell()
return size

file = open('data_csv.log', 'rb')
curr_file_size = getSize(file)

# Get the last file Size
if os.path.exists("file_size"):
with open("file_size") as f:
    saved_file_size = int(f.read())


# Get the last processed line
if os.path.exists("lastline"):
with open("lastline") as f:
    start_row = int(f.read())


if curr_file_size < saved_file_size: start_row = 0

cur_row = 0
for row in csv_data:
 if cur_row >= start_row:
     cursor.execute('INSERT INTO `heating` VALUES ( %s, %s, %s, %s, %s, %s, %s, %s, %s,    %s, %s, %s, %s, %s, %s, %s ,%s)', row)

     # Other processing if necessary

 cur_row += 1

 mydb.commit()
 cursor.close()


# Store the last processed line
with open("lastline", 'w') as f:
start_line = f.write(str(cur_row + 1)) # you want to start at the **next** line
                                      # next time
# Store Current  File Size To Find File Flush    
with open("file_size", 'w') as f:
start_line = f.write(str(curr_file_size))

# not necessary but good for debug
print (str(cur_row))



 print "Done"

Edit: Final Code Submited by ZeroG and now working on the system!! 编辑:最终代码由ZeroG提交,现在可以在系统上工作!! Thank You also Too Xion345 For Helping 也感谢您Xion345的帮助

Each csv row seems to contain a timestamp. 每个csv行似乎都包含一个时间戳。 If these are always increasing, you could query the db for the maximum timestamp already recorded, and skip all rows before that time when reading the csv. 如果这些数字一直在增加,则可以查询数据库以获取已记录的最大时间戳,并在读取csv时跳过该时间之前的所有行。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM