简体   繁体   中英

Insert into MySQl database after reading csv file?

I have a csv file like this:

nohaelprince@uwaterloo.ca, 01-05-2014
nohaelprince@uwaterloo.ca, 01-05-2014
nohaelprince@uwaterloo.ca, 01-05-2014
nohaelprince@gmail.com, 01-05-2014

I need to read the above csv file and extract domain name and also the count of emails address by domain name and date as well. All these things I need to insert into MySQL database but somehow I am stuck how to insert into MySQL database after iterating the list I got.

Query will be like this:

INSERT INTO domains(domain_name, cnt, date_of_entry) VALUES (%s, %s, %s);

Below is the code

#!/usr/bin/python
import fileinput
import csv
import os
import sys
import MySQLdb

from collections import defaultdict

lst = defaultdict(list)
d_lst = defaultdict(list)

# ======================== Defined Functions ======================
def get_file_path(filename):
    currentdirpath = os.getcwd()  
    # get current working directory path
    filepath = os.path.join(currentdirpath, filename)
    return filepath
# ===========================================================
def read_CSV(filepath):

   domain_list = []
   domain_date_list = []
   sorted_domain_list_bydate = defaultdict(list)

   with open(filepath, 'rb') as csvfile:
       reader = csv.reader(csvfile)

       for row in reader:
          # insert the 1st & 2nd column of the CSV file into a set called input_list
           email = row[0].strip().lower()
           date  = row[1].strip()

           domain_date_list.append([date, email[ email.find("@") : ]])
           domain_list.append(email[ email.find("@") : ])

   for k, v in domain_date_list: 
         sorted_domain_list_bydate[k].append(v)


   # remove duplicates from domain list
   domain_list = list(set(domain_list))

   return sorted_domain_list_bydate, domain_list
# ===========================================================
def update_DB(lst):

    # open a database connection
    db = MySQLdb.connect(host="localhost", # your host, usually localhost
                         user="root", # your username
                          passwd="abcdef1234", # your password
                          db="test") # name of the data base
    cur = db.cursor() 

    a = []
    for k, v in lst.items():
        # now what should I do here?
        # this is what I am confuse

    db.commit()
    db.close()
# ==========================================================

# ======================= main program =======================================
path = get_file_path('emails.csv') 
[lst, d_lst] = read_CSV(path) # read the input file
update_DB(lst) # insert data into domains table

I am confuse in update_DB method.

I am not sure why you have such a complicated program for a simple task. Lets start from the top:

  1. You need to first organize your data correctly by domain, date and then count.

     import csv from collections import defuaultdict, Counter domain_counts = defaultdict(Counter) with open('somefile.csv') as f: reader = csv.reader(f) for row in reader: domain_counts[row[0].split('@')[1].strip()][row[1]] += 1 
  2. Next, you need to insert each row correctly in the database:

     db = MySQLdb.connect(...) cur = db.cursor() q = 'INSERT INTO domains(domain_name, cnt, date_of_entry) VALUES(%s, %s, %s)' for domain, data in domain_counts.iteritems(): for email_date, email_count in data.iteritems(): cur.execute(q, (domain, email_count, email_date)) db.commit() 

As your dates are not being inserted correctly, try this updated query instead:

q = """INSERT INTO 
          domains(domain_name, cnt, date_of_entry)
          VALUES(%s, %s, STR_TO_DATE(%s, '%d-%m-%Y'))"""

the read_csv function here is returning the sorteddomainlistbydate,and the domain_list (which is a list), used by the update_db function, that's where you do the insert.

your list just contains the domain names, while what it should for per pair of key vals should contain the domain name and the count like

google.com,2

live.com,1

for k, v in lst.items():
     cur.execute("INSERT INTO domains(domain_name, cnt, date_of_entry) VALUES ('" + str(k) + "','" + str(v) + "','" + str(time.strftime("%d/%m/%Y"))+"')")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM