繁体   English   中英

如何将python中for循环的输出写入csv格式的文件?

[英]How can I write output from a for loop in python into a csv-formatted file?

以下是python脚本,用于标识在不同文件列表中是否找到某些单词。

experiment=open('potentiation.txt')
lines=experiment.read().splitlines()
receptors=['crystal_1.txt', 'modeller_1.txt', 'moe_1.txt',
           'nci5_modeller0000_1.txt', 'nci5_modeller0001_1.txt',
           'nci5_modeller0002_1.txt', 'nci5_modeller0003_1.txt',
           'nci5_modeller0004_1.txt', 'nci5_modeller0005_1.txt',
           'nci5_modeller0006_1.txt', 'nci5_modeller0007_1.txt',
           'nci5_modeller0008_1.txt', 'nci5_modeller0009_1.txt',
           'nci5_modeller0010_1.txt', 'nci5_modeller0011_1.txt',
           'nci5_moe0000_1.txt', 'nci5_moe0001_1.txt', 'nci5_moe0002_1.txt',
           'nci5_moe0003_1.txt', 'nci5_moe0004_1.txt', 'nci5_moe0005_1.txt',
           'nci5_moe0006_1.txt', 'nci5_moe0007_1.txt', 'nci5_moe0008_1.txt',
           'nci5_moe0009_1.txt', 'nci5_moe0010_1.txt', 'nci5_moe0011_1.txt',
           'nci5_moe0012_1.txt', 'nci5_moe0013_1.txt', 'nci5_moe0014_1.txt']

for ligand in lines:
    for protein in receptors:
        file1=open(protein,"r")
        read1=file1.read()
        find_hit=read1.find(ligand)
        if find_hit == -1:
            print ligand,protein,"Not Found"
        else:
            print ligand,protein, "Found"

此代码的输出示例如下:

345647 nci5_moe0012_1.txt Not Found
345647 nci5_moe0013_1.txt Not Found
345647 nci5_moe0014_1.txt Found

我的问题是我如何获取输出并将其格式化为csv文件,如下例所示?

Ligand  nci5_moe0012_1. nci5_moe_0013_1   nci5_moe_0014
345647  Not Found        Not Found        Found

我认为这样的事情会这样做(假设您的输出文件是制表符分隔的):

import csv
import os

receptors = ['crystal_1', 'modeller_1', 'moe_1',
             'nci5_modeller0000_1', 'nci5_modeller0001_1',
             'nci5_modeller0002_1', 'nci5_modeller0003_1',
             'nci5_modeller0004_1', 'nci5_modeller0005_1',
             'nci5_modeller0006_1', 'nci5_modeller0007_1',
             'nci5_modeller0008_1', 'nci5_modeller0009_1',
             'nci5_modeller0010_1', 'nci5_modeller0011_1',
             'nci5_moe0000_1', 'nci5_moe0001_1', 'nci5_moe0002_1',
             'nci5_moe0003_1', 'nci5_moe0004_1', 'nci5_moe0005_1',
             'nci5_moe0006_1', 'nci5_moe0007_1', 'nci5_moe0008_1',
             'nci5_moe0009_1', 'nci5_moe0010_1', 'nci5_moe0011_1',
             'nci5_moe0012_1', 'nci5_moe0013_1', 'nci5_moe0014_1']

with open('potentiation.txt', 'rt') as experiment, \
     open('output.csv', 'wb') as outfile:
    csv_writer = csv.writer(outfile, delimiter='\t')
    csv_writer.writerow(['Ligand'] + receptors)  # header row
    for ligand in (line.rstrip() for line in experiment):
        row = [ligand]
        for protein in receptors:
            with open(protein+'.txt', "rt") as file1:
                found = ['Found', 'Not Found'][file1.read().find(ligand) == -1]
                row.append(found)
        csv_writer.writerow(row)

print('output.csv file written')

更新

正如我在评论中所说,只需读取一次蛋白质文件就可以更快地完成。 为了能够以您想要的方式进行输出和格式化,每个文件中每个配体的检查结果需要在读取每个文件然后多次检查时逐步存储在数据结构中。毕竟,只有一次性写出来。 一个简单的列表清单就足够用于此目的,并已在下面的实现中使用。

权衡是使用更多的内存与读取和重读蛋白质文件。 由于磁盘IO通常是计算机上最慢的东西之一,因此代码复杂性稍微增加的潜在大的性能增益可能是值得的。

以下是显示此替代版本的代码:

import csv
import os

receptors = ['crystal_1', 'modeller_1', 'moe_1',
             'nci5_modeller0000_1', 'nci5_modeller0001_1',
             'nci5_modeller0002_1', 'nci5_modeller0003_1',
             'nci5_modeller0004_1', 'nci5_modeller0005_1',
             'nci5_modeller0006_1', 'nci5_modeller0007_1',
             'nci5_modeller0008_1', 'nci5_modeller0009_1',
             'nci5_modeller0010_1', 'nci5_modeller0011_1',
             'nci5_moe0000_1', 'nci5_moe0001_1', 'nci5_moe0002_1',
             'nci5_moe0003_1', 'nci5_moe0004_1', 'nci5_moe0005_1',
             'nci5_moe0006_1', 'nci5_moe0007_1', 'nci5_moe0008_1',
             'nci5_moe0009_1', 'nci5_moe0010_1', 'nci5_moe0011_1',
             'nci5_moe0012_1', 'nci5_moe0013_1', 'nci5_moe0014_1']

# initialize list of lists holding each ligand and its presence in each receptor
with open('potentiation.txt') as experiment:
    ligands = [[ligand] for ligand in (line.rstrip() for line in experiment)]

for protein in receptors:
    with open(protein + '.txt') as protein_file:
        protein_file_data = protein_file.read()
        for row in ligands:
            # determine if this ligand (row[0]) appears in protein data
            row.append('Found' if row[0] in protein_file_data else 'Not Found')

with open('output.csv', 'wb') as outfile:
    csv_writer = csv.writer(outfile, delimiter='\t')
    csv_writer.writerow(['Ligand'] + receptors)  # header row
    csv_writer.writerows(ligands)

print('output.csv file written')

在将“蛋白质”和“配体”的值添加到适当的列表(在0索引中)之后,您可以将结果保存在列表中(一个配体列表,一个用于蛋白质)。 之后很容易保存它的文本文件。
为了保存,您可以在字符串中打开用于写入和转换列表的文件:

my_string = " ".join(map(str, lst))

然后保存my_string(并为每个列表执行此操作)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM