[英]Unable to insert rows of a CSV file into a MYSQL database through python
[英]Insert into MySQl database after reading csv file?
我有這樣的csv文件:
nohaelprince@uwaterloo.ca, 01-05-2014
nohaelprince@uwaterloo.ca, 01-05-2014
nohaelprince@uwaterloo.ca, 01-05-2014
nohaelprince@gmail.com, 01-05-2014
我需要閱讀上面的csv文件並提取域名,還需要按域名和日期提取電子郵件地址的數量。 所有這些事情我都需要插入MySQL數據庫中,但是以某種方式,我在迭代得到的列表后仍然無法插入MySQL數據庫中。
查詢將是這樣的:
INSERT INTO domains(domain_name, cnt, date_of_entry) VALUES (%s, %s, %s);
下面是代碼
#!/usr/bin/python
import fileinput
import csv
import os
import sys
import MySQLdb
from collections import defaultdict
lst = defaultdict(list)
d_lst = defaultdict(list)
# ======================== Defined Functions ======================
def get_file_path(filename):
currentdirpath = os.getcwd()
# get current working directory path
filepath = os.path.join(currentdirpath, filename)
return filepath
# ===========================================================
def read_CSV(filepath):
domain_list = []
domain_date_list = []
sorted_domain_list_bydate = defaultdict(list)
with open(filepath, 'rb') as csvfile:
reader = csv.reader(csvfile)
for row in reader:
# insert the 1st & 2nd column of the CSV file into a set called input_list
email = row[0].strip().lower()
date = row[1].strip()
domain_date_list.append([date, email[ email.find("@") : ]])
domain_list.append(email[ email.find("@") : ])
for k, v in domain_date_list:
sorted_domain_list_bydate[k].append(v)
# remove duplicates from domain list
domain_list = list(set(domain_list))
return sorted_domain_list_bydate, domain_list
# ===========================================================
def update_DB(lst):
# open a database connection
db = MySQLdb.connect(host="localhost", # your host, usually localhost
user="root", # your username
passwd="abcdef1234", # your password
db="test") # name of the data base
cur = db.cursor()
a = []
for k, v in lst.items():
# now what should I do here?
# this is what I am confuse
db.commit()
db.close()
# ==========================================================
# ======================= main program =======================================
path = get_file_path('emails.csv')
[lst, d_lst] = read_CSV(path) # read the input file
update_DB(lst) # insert data into domains table
我對update_DB
方法感到困惑。
我不確定為什么您要為一個簡單的任務准備這么復雜的程序。 讓我們從頂部開始:
您需要首先按域,日期和計數正確地組織數據。
import csv from collections import defuaultdict, Counter domain_counts = defaultdict(Counter) with open('somefile.csv') as f: reader = csv.reader(f) for row in reader: domain_counts[row[0].split('@')[1].strip()][row[1]] += 1
接下來,您需要在數據庫中正確插入每一行 :
db = MySQLdb.connect(...) cur = db.cursor() q = 'INSERT INTO domains(domain_name, cnt, date_of_entry) VALUES(%s, %s, %s)' for domain, data in domain_counts.iteritems(): for email_date, email_count in data.iteritems(): cur.execute(q, (domain, email_count, email_date)) db.commit()
由於您的日期未正確插入,因此請嘗試使用此更新的查詢:
q = """INSERT INTO
domains(domain_name, cnt, date_of_entry)
VALUES(%s, %s, STR_TO_DATE(%s, '%d-%m-%Y'))"""
這里的read_csv函數將返回sorteddomainlistbydate和update_db函數使用的domain_list(這是一個列表),您將在其中進行插入。
您的列表僅包含域名,而每對鍵值應包含的域名和數量如
google.com,2
live.com,1
for k, v in lst.items():
cur.execute("INSERT INTO domains(domain_name, cnt, date_of_entry) VALUES ('" + str(k) + "','" + str(v) + "','" + str(time.strftime("%d/%m/%Y"))+"')")
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.