簡體   English   中英

在python中編輯文本文件並制作一個新文件

[英]editing a text file in python and making a new one

我有一個這樣的文本文件:

>ENST00000511961.1|ENSG00000013561.13|OTTHUMG00000129660.5|OTTHUMT00000370661.3|RNF14-003|RNF14|278
MSSEDREAQEDELLALASIYDGDEFRKAESVQGGETRIYLDLPQNFKIFVSGNSNECLQNSGFEYTICFLPPLVLNFELPPDYPSSSPPSFTLSGKWLSPTQLSALCKHLDNLWEEHRGSVVLFAWMQFLKEETLAYLNIVSPFELKIGSQKKVQRRTAQASPNTELDFGGAAGSDVDQEEIVDERAVQDVESLSNLIQEILDFDQAQQIKCFNSKLFLCSICFCEKLGSECMYFLECRHVYCKACLKDYFEIQIRDGQVQCLNCPEPKCPSVATPGQ
>ENST00000506822.1|ENSG00000013561.13|OTTHUMG00000129660.5|OTTHUMT00000370662.1|RNF14-004|GAPDH|132
MSSEDREAQEDELLALASIYDGDEFRKAESVQGGETRIYLDLPQNFKIFVSGNSNECLQNSGFEYTICFLPPLVLNFELPPDYPSSSPPSFTLSGKWLSPTQLSALCKHLDNLWEEHRGSVVLFAWMQFLKE
>ENST00000513019.1|ENSG00000013561.13|OTTHUMG00000129660.5|OTTHUMT00000370663.1|RNF14-005|ACTB|99
MSSEDREAQEDELLALASIYDGDEFRKAESVQGGETRIYLDLPQNFKIFVSGNSNECLQNSGFEYTICFLPPLVLNFELPPDYPSSSPPSFTLSGKWLS
>ENST00000356143.1|ENSG00000013561.13|OTTHUMG00000129660.5|-|RNF14-202|HELLE|474
MSSEDREAQEDELLALASIYDGDEFRKAESVQGGETRIYLDLPQNFKIFVSGNSNECLQNSGFEYTICFLPPLVLNFELPPDYPSSSPPSFTLSGKWLSPTQLSALCKHLDNLWEEHRGSVVLFAWMQFLKEETLAYLNIVSPFELKIGSQKKVQRRTAQASPNTELDFGGAAGSDVDQEEIVDERAVQDVESLSNLIQEILDFDQAQQIKCFNSKLFLCSICFCEKLGSECMYFLECRHVYCKACLKDYFEIQIRDGQVQCLNCPEPKCPSVATPGQVKELVEAELFARYDRLLLQSSLDLMADVVYCPRPCCQLPVMQEPGCTMGICSSCNFAFCTLCRLTYHGVSPCKVTAEKLMDLRNEYLQADEANKRLLDQRYGKRVIQKAL

我想在python list以“ > ”開頭的行的第6個元素。 為此,我首先使用python創建dictionary ,然后鍵應該是我想要的list 像這樣:

from itertools import groupby
with open('infile.txt') as f:
    groups = groupby(f, key=lambda x: not x.startswith(">"))
    d = {}
    for k,v in groups:
        if not k:
            key, val = list(v)[0].rstrip(), "".join(map(str.rstrip,next(groups)[1],""))
            d[key] = val


k = d.keys()
res = [el[5:] for s in k for el in s.split("|")]

但是它將返回該行中所有以">".開頭的元素">".

你知道如何解決嗎?

這是預期的輸出:

["RNF14", "GAPDH", "ACTB", "HELLE"]

這應該有所幫助。 ->使用簡單的迭代, str.startswithstr.split

演示:

res = []
with open(filename, "r") as infile:
    for line in infile:
        if line.startswith(">"):
            val = line.split("|")
            res.append(val[5])
print(res)

輸出:

['RNF14', 'GAPDH', 'ACTB', 'HELLE']

在您的代碼中替換

res = [el[5:] for s in k for el in s.split("|")]

res = [s.split("|")[5] for s in k ] #Should work.

使用filter而不是groupby和map的解決方案

with open('infile.txt') as f:
    lines = f.readlines()

groups = filter(lambda x: x.startswith(">"), lines)
res = list(map(lambda x: x.split('|')[5],groups))

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM