簡體   English   中英

如何使用python將文本文件拆分為多個文本文件

[英]How to split text file into number of text files using python

我有一個巨大的文本文件,其中包含這樣的數據集

EOG61ZHH8   ENSRNOG00000004762  627
EOG61ZHH8   ENSRNOG00000004762  627
EOG61ZHH9   ENSG00000249709 1075
EOG61ZHH9   ENSG00000249709 230
EOG61ZHH9   ENSG00000249709 87
EOG61ZHHB   ENSG00000134030 2347
EOG61ZHHB   ENSG00000134030 3658
EOG61ZHHB   ENSRNOG00000018342  241
EOG61ZHHB   ENSRNOG00000018342  241
EOG61ZHHC   ENSBTAG00000006084  1159
EOG61ZHHC   ENSG00000158828 820
EOG61ZHHC   ENSMMUG00000000126  631

我想這樣轉換或拆分

EOG61ZHH8.txt
ENSRNOG00000004762  627
ENSRNOG00000004762  627
EOG61ZHH9.txt
ENSG00000249709 1075
ENSG00000249709 230
ENSG00000249709 87

等等。 我不知道從哪里開始從上面的文本文件中獲取新的txt文件,我之前已經做過這件事,但是條目在條目開始之前有'['號,現在我有很多文件,但是沒有任何特殊的符號來轉換它們是我在python中完成的代碼

with open("entry.txt") as f: 
  for line in f:
    if line[0] == "[":
     if out: out.close()
     out = open(line.split()[1] + ".txt", "w")
   else: out.write(line)'

我在Windows中使用它,所以我了解linux awk命令,因此不需要有關linux的信息

您只需要對腳本進行一些調整:

out = None
oldfile = None
with open("entry.txt") as f: 
    for line in f:
        newfile = l.split("\t")[0]
        if newfile != oldfile:
            if out: out.close()
            out = open(newfile + ".txt", "w")
            oldfile = newfile
        out.write("\t".join(line.split("\t")[1:]))

具有正則表達式;

import re

string = '    EOG61ZHH8   ENSRNOG00000004762  627    EOG61ZHH8   ENSRNOG00000004762  627    EOG61ZHH9   ENSG00000249709 1075    EOG61ZHH9   ENSG00000249709 230    EOG61ZHH9   ENSG00000249709 87    EOG61ZHHB   ENSG00000134030 2347    EOG61ZHHB   ENSG00000134030 3658    EOG61ZHHB   ENSRNOG00000018342  241    EOG61ZHHB   ENSRNOG00000018342  241    EOG61ZHHC   ENSBTAG00000006084  1159    EOG61ZHHC   ENSG00000158828 820    EOG61ZHHC   ENSMMUG00000000126  631'

result = re.findall('\s+(.*?)\s+(.*?)\s+(\d+)', string, re.S)

buffer = {}

for i in result:
    if not i[0] in buffer:
        buffer[i[0]] = ''

    buffer[i[0]] = buffer[i[0]] + i[1] + '  ' + i[2] + '\n'

for i in buffer.iteritems():
    print i

    filename = i[0] + '.txt'
    content = i[1] # you could remove the unneeded "\n" here with substring if wanted

    # CODE: Create the file with "filename"

    # CODE: Write "content" to the file

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM