简体   繁体   中英

Import CSVs into different SQL tables

I have a directory full of CSVs that need to be imported into different tables of a SQL Server database. Fortunately the filename of the appended CSVs starts with the string "Concat_AAAAA_XX..." where the AAAAA part is a alphanumeric string followed by XX which is a double integer. Both act as keys for a specific table in SQL.

My question is what would be the most elegant way to create a Python Script that would take the AAAAA & XX values from each filename, and know which table to import that data into?

CSV1 named: Concat_T101_14_20072021.csv
would need to be imported into Table A

CSV2 named: Concat_RB728_06_25072021.csv
would need to be imported into Table B

CSV3 named: Concat_T144_21_27072021.csv
would need to be imported into Table C

and so on...

I've read up that the ConfigParser package may be able to help, but cannot understand how to apply its theory here. The reason for suggesting ConfigParser is because I'd like to have the flexibility or editing a config file (eg "CONFIG.INI") rather than having to hard-code new entries into the python script.

The code I have so far works for just a standalone dataset, which can be found here .

Here is the code I'm using:

import pypyodbc as odbc
import pandas as pd 
import os

os.chdir('SQL Loader')
df = pd.read_csv('Real-Time_Traffic_Incident_Reports.csv')

df['Published Date'] = pd.to_datetime(df['Published Date']).dt.strftime('%Y-%m-%d %H:%M:%S')
df['Status Date'] = pd.to_datetime(df['Published Date']).dt.strftime('%Y-%m-%d %H:%M:%S')

df.drop(df.query('Location.isnull() | Status.isnull()').index, inplace=True)

columns = ['Traffic Report ID', 'Published Date', 'Issue Reported', 'Location', 
            'Address', 'Status', 'Status Date']

df_data = df[columns]
records = df_data.values.tolist()

DRIVER = 'SQL Server'
SERVER_NAME = 'MY SERVER'
DATABASE_NAME = 'MYDATABASE'

def connection_string(driver, server_name, database_name):
    conn_string = f"""
        DRIVER={{{driver}}};
        SERVER={server_name};
        DATABASE={database_name};
        Trust_Connection=yes;        
    """
    return conn_string

try:
    conn = odbc.connect(connection_string(DRIVER, SERVER_NAME, DATABASE_NAME))
except odbc.DatabaseError as e:
    print('Database Error:')    
    print(str(e.value[1]))
except odbc.Error as e:
    print('Connection Error:')
    print(str(e.value[1]))


sql_insert = '''
    INSERT INTO Austin_Traffic_Incident 
    VALUES (?, ?, ?, ?, ?, ?, ?, GETDATE())
'''

try:
    cursor = conn.cursor()
    cursor.executemany(sql_insert, records)
    cursor.commit();    
except Exception as e:
    cursor.rollback()
    print(str(e[1]))
finally:
    print('Task is complete.')
    cursor.close()
    conn.close()

You can do a translation table using a dict like

import re
from glob import glob

translation_table = {
    '14': 'A', 
    '06': 'B',
    '21': 'C'
    }

# get all csv files from current directory
for filename in glob("*.csv"):

    # extract the file number with a regular expression
    # (can also be done easily with split function)
    filenum = re.match(r"^Concat_([0-9]+)_[0-9]{8}.csv$", filename).group(1)

    # use the translation table to get the table name
    tablename = translation_table[filenum]
    
    print(f"Data from file '{filename}' goes to table '{tablename}'")

I would say that there are multiple ways to do this kind of thing. You can use pure SQL, as I will illustrate below, or you can use Python. If you want a Python solution, just post back and I'll provide the code. Some people don't like it what people recommend solutions outside of the specific technology that they list in the original post. So, here is the SQL solution.

DECLARE @intFlag INT
SET @intFlag = 1
WHILE (@intFlag <=48)
BEGIN

PRINT @intFlag


declare @fullpath1 varchar(1000)
select @fullpath1 = '''\\source\FTP1\' + convert(varchar, getdate()- @intFlag , 112) + '_SPGT.SPL'''
declare @cmd1 nvarchar(1000)
select @cmd1 = 'bulk insert [dbo].[table1] from ' + @fullpath1 + ' with (FIELDTERMINATOR = ''\t'', FIRSTROW = 5, ROWTERMINATOR=''0x0a'')'
exec (@cmd1)

-------------------------------------------

declare @fullpath2 varchar(1000)
select @fullpath2 = '''\\source\FTP2\' + convert(varchar, getdate()-@intFlag, 112) + '_SPBMI_GL_PROP_USD_C.SPL'''
declare @cmd2 nvarchar(1000)
select @cmd2 = 'bulk insert [dbo].[table2] from ' + @fullpath2 + ' with (FIELDTERMINATOR = ''\t'', FIRSTROW = 5, ROWTERMINATOR=''0x0a'')'
exec (@cmd2)

-------------------------------------------

declare @fullpath3 varchar(1000)
select @fullpath3 = '''\\source\FTP3\' + convert(varchar, getdate()-@intFlag, 112) + '_SPBMI_GL_PROP_USD_C_ADJ.SPC'''
declare @cmd3 nvarchar(1000)
select @cmd3 = 'bulk insert [dbo].[table3] from ' + @fullpath3 + ' with (FIELDTERMINATOR = ''\t'', FIRSTROW = 7, ROWTERMINATOR=''0x0a'')'
exec (@cmd3)

-------------------------------------------

declare @fullpath4 varchar(1000)
select @fullpath4 = '''\\source\FTP4\' + convert(varchar, getdate()-@intFlag, 112) + '_SPGTINFRA_ADJ.SPC'''
declare @cmd4 nvarchar(1000)
select @cmd4 = 'bulk insert [dbo].[table4] from ' + @fullpath4 + ' with (FIELDTERMINATOR = ''\t'', FIRSTROW = 7, ROWTERMINATOR=''0x0a'')'
exec (@cmd4)

SET @intFlag = @intFlag + 1
    
END
GO

Here is the Python solution that you asked for.

The Python solution is waaayyyy easier, of course.

import pyodbc

engine = "mssql+pyodbc://server_name/db_name?driver=SQL Server Native Client 11.0?trusted_connection=yes"

for f in all_files: 
  # load each file into each dataframe...something like...
  df = pd.read_csv(f, delimiter='\t', skiprows=0, header=[0]) 
  # all_df[x].append(df) ... you may or may not need to append ...depends on your setup
  # depends on your setup...
  
  df.to_sql(table_name, engine, if_exists='replace', index=True, chunksize=100000)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM