簡體   English   中英

使用 Python 分割/切片文本文件

[英]Spliting / Slicing Text File with Python

我正在學習 python,我一直在嘗試將此 txt 文件拆分為多個文件,這些文件在每行的開頭按切片字符串分組。

目前我有兩個問題:

1 - 字符串可以有 5 或 6 個字符,最后用空格標記。(如 WSON33 和 JHSF3 等......)

這是我要拆分的文件的示例(第一行是標題):

H24/06/202000003TORDISTD 
BWSON33      0803805000000000016400000003250C000002980002415324C1 0000000000000000
BJHSF3       0804608800000000003500000000715V000020280000031810C1 0000000000000000

2-我有很多代碼,但我不能把所有東西放在一起,所以這可以工作:

這段代碼是我從另一篇文章中改編而來的,它可以分成多個文件,但是在我開始編寫文件之前需要對行進行排序,我還需要在每個文件中復制 header 而不是將其隔離為一個文件。

with open('tordist.txt', 'r') as fin:

# group each line in input file by first part of split
for i, (k, g) in enumerate(itertools.groupby(fin, lambda l: l.split()[0]),1):
    # create file to write to suffixed with group number - start = 1
    with open('{0} tordist.txt'.format(i), 'w') as fout:
        
        # for each line in group write it to file
        for line in g:
            fout.write(line.strip() + '\n')

因此,據我所知,您有一個包含多行的文本文件,其中每一行都以 5 或 6 個字符的短字符串開頭。 聽起來您希望所有以 go 相同的字符串開頭的行都放入同一個文件中,以便在運行代碼后,您擁有與唯一起始字符串一樣多的新文件。 那准確嗎?

和你一樣,我對 python 還是很陌生,所以我確信有更緊湊的方法可以做到這一點。 下面的代碼多次循環文件,並在與文本和 python 文件所在的文件相同的文件夾中創建新文件。

# code which separates lines in a file by an identifier,
#and makes new files for each identifier group

filename = input('type filename')
if len(filename) < 1:
  filename = "mk_newfiles.txt"
filehandle = open(filename)

#This chunck loops through the file, looking at the beginning of each line,
#and adding it to a list of identifiers if it is not on the list already.
Unique = list()
for line in filehandle:
#like Lalit said, split is a simple way to seperate a longer string
  line = line.split()
  if line[0] not in Unique:
      Unique.append(line[0])

#For each item in the list of identifiers, this code goes through
#the file, and if a line starts with that identifier then it is
#added to a new file.
for item in Unique:
    #this 'if' skips the header, which has a '/' in it
    if '/' not in item:
        # the .seek(0) 'rewinds' the iteration variable, which is apperently needed
        #needed if looping through files multiple times
        filehandle.seek(0)

        #makes new file
        newfile = open(str(item) + ".txt","w+")

        #inserts header, and goes to next line
        newfile.write(Unique[0])
        newfile.write('\n')

        #goes through old file, and adds relevant lines to new file
        for line in filehandle:
            split_line = line.split()
            if item == split_line[0]:
                newfile.write(line)

print(Unique)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM