简体   繁体   English

从文本文件中解析数据并将其存储在数据库中

[英]Parsing data from a text file and storing it in a database

First and foremost thank you for any and all help you can offer.首先感谢您提供的任何和所有帮助。

The problem we receive data in TXT format and need to be able to parse out that data into some form of database/repository.我们接收 TXT 格式的数据并且需要能够将这些数据解析到某种形式的数据库/存储库中的问题。

The idea is everyday between _____ and ____ hours a .txt file is created containing data.这个想法是每天在 _____ 和 ____ 小时之间创建一个包含数据的 .txt 文件。 For example "Newdata20220629.txt" in text format.例如文本格式的“Newdata20220629.txt”。

However, this data is extremely hard to read and almost impossible to search in it's raw form.但是,这些数据极难阅读,几乎不可能以原始形式搜索。 The txt file is raw however the first line of the txt file contains the columns for each row of data such as "Name, Date, File number," etc.. txt 文件是原始文件,但 txt 文件的第一行包含每行数据的列,例如“名称、日期、文件编号”等。

The following rows are raw data in the order of those categories.以下行是按这些类别顺序排列的原始数据。 For instance;例如; John Smith, 6/29/2022, 1234123约翰·史密斯,2022 年 6 月 29 日,1234123

any columns without data in the field have a comma but do not contain data such as;字段中没有数据的任何列都有逗号,但不包含数据,例如;

John Smith,, or ,6/29/2022,约翰·史密斯,或 2022 年 6 月 29 日,

So essentially what I'd like to do is create a tool that runs continuously looking for a file in the format of "Newdate(date).txt", and parsing that text based on what I mentioned above then storing it in a user-friendly and searchable database.所以基本上我想做的是创建一个工具,它会持续运行以“Newdate(date).txt”格式查找文件,并根据我上面提到的内容解析该文本,然后将其存储在用户中-友好和可搜索的数据库。 Personally, I am thinking a SQL database may be the easiest way for this but don't have a clue as to where I should start.就个人而言,我认为 SQL 数据库可能是最简单的方法,但不知道我应该从哪里开始。

I suggest using the SQLBulkCopy class (as described here: https://docs.microsoft.com/en-us/dotnet/api/system.data.sqlclient.sqlbulkcopy?view=dotnet-plat-ext-6.0 ) in conjunction with the CsvDataReader class ( https://joshclose.github.io/CsvHelper/examples/csvdatareader/ ) as demonstrated below.我建议结合使用SQLBulkCopy类(如此处所述: https ://docs.microsoft.com/en-us/dotnet/api/system.data.sqlclient.sqlbulkcopy?view=dotnet-plat-ext-6.0) CsvDataReader 类 ( https://joshclose.github.io/CsvHelper/examples/csvdatareader/ ) 如下所示。 You need to research file system watchers (as Hursey said) to get your App notified when a new file is written to the folder you are monitoring.您需要研究文件系统观察者(如 Hursey 所说),以便在将新文件写入您正在监视的文件夹时通知您的应用程序。

Protected Sub UploadCSV(filePath As String)
    ' Create CsvDataReader (IDataReader) to use with SqlBulkCopy
    Using csvData = New CsvDataReader(New FileStream(filePath , FileMode.Open)
        ' Reads first record as a header row.
        ' Name columns based on the values in the header row
        csvData.Settings.HasHeaders = True
        ' Set data types for parsing data
        csvData.Columns.Add("varchar") ' Column 1
        csvData.Columns.Add("varchar") ' Column 2
        csvData.Columns.Add("datetime") ' Column 3
        csvData.Columns.Add("decimal(18,2)") ' Column 4
        ' Create SqlBulkCopy object to import from the CsvDataReader
        Using bulkCopy = New SqlBulkCopy("Data Source=.;Initial Catalog=YourDatabase;User ID=Your Usernamesa;Password=YourPassword")
            ' Table to write to (must already exist).
            bulkCopy.DestinationTableName = "YourSQLTable"
            ' Map CSV column names to SQL columns names
            bulkCopy.ColumnMappings.Add("CSV_Column_Name_1", "SQL_Column_1") 
            bulkCopy.ColumnMappings.Add("CSV_Column_Name_2", "SQL_Column_2")
            bulkCopy.ColumnMappings.Add("CSV_Column_Name_3", "SQL_Column_3")
            bulkCopy.ColumnMappings.Add("CSV_Column_Name_4", "SQL_Column_4")
            ' Do the import
            bulkCopy.WriteToServer(csvData)
        End Using ' dispose SqlBulkCopy
    End Using ' dispose CsvDataReader
End Sub 

This should take a .txt file and write to a .csv then write the .csv to an SQL database/Table, just enter your server information and the paths of the file.这应该需要一个 .txt 文件并写入 .csv 然后将 .csv 写入 SQL 数据库/表,只需输入您的服务器信息和文件的路径。

Import these to use导入这些以使用

import pandas as pd
import pyodbc

Read txt file and write to csv file读取 txt 文件并写入 csv 文件

read_txt = pd.read_csv('Newdata20220629.txt', delimiter = ',')
write_csv = read_txt.to_csv('Newdata2022029.csv', index = None)

Import CSV导入 CSV

data = pd.read_csv ('C:\Users\ExampleUser\Desktop\Test\Newdata2022029.csv')   
df = pd.DataFrame(data)

Connect to SQL Server连接到 SQL Server

connection = pyodbc.connect('Driver={SQL Server};'
                      'Server=RON\SQLEXPRESS;'
                      'Database=test_database;'
                      'Trusted_Connection=yes;')
cursor = connection.cursor()

Create Table创建表

cursor.execute('''
        CREATE TABLE Table_Name (
            Name nvarchar(50),
            Date nvarchar(50),
            Product_ID Int
            )
               ''')

Insert DataFrame to Table将 DataFrame 插入表

for row in df.itertuples():
    cursor.execute('''
                INSERT INTO Table_Name (Name, Date, Product_ID)
                VALUES (..., ..., ...)
                ''',
                row.Name, 
                row.Date,
                row.Product_ID
                )
connection.commit()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM