簡體   English   中英

使用Python將一個數據文本文件拆分成幾個用於MySQL的文本文件

[英]Using Python to split a data text file into several text files for MySQL

我有一個數據 txt 文件,其格式設置為以以下格式(有點誇張)加載到數據庫(MySQL)中:

數據.txt

name   age profession datestamp
John   23  engineer   2020-03-01
Amy    17  doctor     2020-02-27
Gordon 19  artist     2020-02-27
Kevin  25  chef       2020-03-01

以上是通過python執行以下命令生成的:

LOAD DATA LOCAL INFILE '/home/sample_data/data.txt' REPLACE INTO TABLE person_professions 
FIELDS TERMINATED BY 0x01 OPTIONALLY ENCLOSED BY '\"' LINES TERMINATED BY '\n'
 (name,age,profession,datestamp)

它創建了 data.txt; 但是,data.txt 對於一次插入所有數據庫來說真的很大(設置了約 200 MB 插入限制),我想將數據分成幾個塊(data_1.txt、data_2.txt、data_3.txt 等) .) 並將它們一一插入以避免達到插入大小限制。 我知道你可以一行一行地尋找一個條件來切出數據,例如

with open('data.txt', 'w') as f:
    data = f.read().split('\n')
    if some condition:
       with open('data_1.txt', 'w') as f2:
            insert data 

但我不太確定如何提出條件斷點以使其開始插入新的 txt 文件,除非有更好的方法。

我編寫了一個可以完成工作的函數,具體取決於文件的大小。 代碼注釋中的解釋。

def split_file(file_name, lines_per_file=100000):
    # Open large file to be read in UTF-8
    with open(file_name, 'r', encoding='utf-8') as rf:
        # Read all lines in file
        lines = rf.readlines()
        print ( str(len(lines)) + ' LINES READ.')
        # Set variables to count file number and count of lines written
        file_no = 0
        wlines_count = 0
        # For x from 0 to length of lines read stepping by number of lines that will be written in each file
        for x in range(0, len(lines), lines_per_file):
            # Open new "split" file for writing in UTF-8
            with open( 'data' + '-' + str(file_no) + '.txt', 'w', encoding='utf-8') as wf:
                # Write lines
                wf.writelines(lines[x:x+lines_per_file])
                # Update the written lines count
                wlines_count += (len(lines[x:x + lines_per_file]))
                # Update new "split" file count mainly for naming
                file_no+=1
        print(str(wlines_count) + " LINES WRITTEN IN " + str(file_no) + " FILES.")

# Split data.txt into files containing 100000 lines
split_file('data.txt',100000)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM