简体   繁体   English

如何将python中for循环的输出写入csv格式的文件?

[英]How can I write output from a for loop in python into a csv-formatted file?

The following below is python script that identifies whether certain words are found or not found in a list of different files. 以下是python脚本,用于标识在不同文件列表中是否找到某些单词。

experiment=open('potentiation.txt')
lines=experiment.read().splitlines()
receptors=['crystal_1.txt', 'modeller_1.txt', 'moe_1.txt',
           'nci5_modeller0000_1.txt', 'nci5_modeller0001_1.txt',
           'nci5_modeller0002_1.txt', 'nci5_modeller0003_1.txt',
           'nci5_modeller0004_1.txt', 'nci5_modeller0005_1.txt',
           'nci5_modeller0006_1.txt', 'nci5_modeller0007_1.txt',
           'nci5_modeller0008_1.txt', 'nci5_modeller0009_1.txt',
           'nci5_modeller0010_1.txt', 'nci5_modeller0011_1.txt',
           'nci5_moe0000_1.txt', 'nci5_moe0001_1.txt', 'nci5_moe0002_1.txt',
           'nci5_moe0003_1.txt', 'nci5_moe0004_1.txt', 'nci5_moe0005_1.txt',
           'nci5_moe0006_1.txt', 'nci5_moe0007_1.txt', 'nci5_moe0008_1.txt',
           'nci5_moe0009_1.txt', 'nci5_moe0010_1.txt', 'nci5_moe0011_1.txt',
           'nci5_moe0012_1.txt', 'nci5_moe0013_1.txt', 'nci5_moe0014_1.txt']

for ligand in lines:
    for protein in receptors:
        file1=open(protein,"r")
        read1=file1.read()
        find_hit=read1.find(ligand)
        if find_hit == -1:
            print ligand,protein,"Not Found"
        else:
            print ligand,protein, "Found"

An example of the output of this code is below: 此代码的输出示例如下:

345647 nci5_moe0012_1.txt Not Found
345647 nci5_moe0013_1.txt Not Found
345647 nci5_moe0014_1.txt Found

My question is how can I take the output and format it into a csv file that looks like the example below? 我的问题是我如何获取输出并将其格式化为csv文件,如下例所示?

Ligand  nci5_moe0012_1. nci5_moe_0013_1   nci5_moe_0014
345647  Not Found        Not Found        Found

I think something like this would do it (assuming your output file is tab-delimited): 我认为这样的事情会这样做(假设您的输出文件是制表符分隔的):

import csv
import os

receptors = ['crystal_1', 'modeller_1', 'moe_1',
             'nci5_modeller0000_1', 'nci5_modeller0001_1',
             'nci5_modeller0002_1', 'nci5_modeller0003_1',
             'nci5_modeller0004_1', 'nci5_modeller0005_1',
             'nci5_modeller0006_1', 'nci5_modeller0007_1',
             'nci5_modeller0008_1', 'nci5_modeller0009_1',
             'nci5_modeller0010_1', 'nci5_modeller0011_1',
             'nci5_moe0000_1', 'nci5_moe0001_1', 'nci5_moe0002_1',
             'nci5_moe0003_1', 'nci5_moe0004_1', 'nci5_moe0005_1',
             'nci5_moe0006_1', 'nci5_moe0007_1', 'nci5_moe0008_1',
             'nci5_moe0009_1', 'nci5_moe0010_1', 'nci5_moe0011_1',
             'nci5_moe0012_1', 'nci5_moe0013_1', 'nci5_moe0014_1']

with open('potentiation.txt', 'rt') as experiment, \
     open('output.csv', 'wb') as outfile:
    csv_writer = csv.writer(outfile, delimiter='\t')
    csv_writer.writerow(['Ligand'] + receptors)  # header row
    for ligand in (line.rstrip() for line in experiment):
        row = [ligand]
        for protein in receptors:
            with open(protein+'.txt', "rt") as file1:
                found = ['Found', 'Not Found'][file1.read().find(ligand) == -1]
                row.append(found)
        csv_writer.writerow(row)

print('output.csv file written')

Update 更新

As I said in a comment this could be done a lot faster by only reading the protein files once. 正如我在评论中所说,只需读取一次蛋白质文件就可以更快地完成。 In order to be able to do that and format the output the way you want, the results of checking for each ligand in each file need to stored in a data-structure built-up incrementally as each file is read and then checked multiple times, only to be written out, all-at-once, after all have been done. 为了能够以您想要的方式进行输出和格式化,每个文件中每个配体的检查结果需要在读取每个文件然后多次检查时逐步存储在数据结构中。毕竟,只有一次性写出来。 A simple list-of-lists is adequate for this purpose and has been used in implementation below. 一个简单的列表清单就足够用于此目的,并已在下面的实现中使用。

The trade-off is using more memory vs reading and rereading the protein files over-and-over. 权衡是使用更多的内存与读取和重读蛋白质文件。 Since disk IO is often one of the slowest things on a computer, the potentially large performance gain for only a slight increase in code-complexity is probably worthwhile. 由于磁盘IO通常是计算机上最慢的东西之一,因此代码复杂性稍微增加的潜在大的性能增益可能是值得的。

Here's the code showing this alternative version: 以下是显示此替代版本的代码:

import csv
import os

receptors = ['crystal_1', 'modeller_1', 'moe_1',
             'nci5_modeller0000_1', 'nci5_modeller0001_1',
             'nci5_modeller0002_1', 'nci5_modeller0003_1',
             'nci5_modeller0004_1', 'nci5_modeller0005_1',
             'nci5_modeller0006_1', 'nci5_modeller0007_1',
             'nci5_modeller0008_1', 'nci5_modeller0009_1',
             'nci5_modeller0010_1', 'nci5_modeller0011_1',
             'nci5_moe0000_1', 'nci5_moe0001_1', 'nci5_moe0002_1',
             'nci5_moe0003_1', 'nci5_moe0004_1', 'nci5_moe0005_1',
             'nci5_moe0006_1', 'nci5_moe0007_1', 'nci5_moe0008_1',
             'nci5_moe0009_1', 'nci5_moe0010_1', 'nci5_moe0011_1',
             'nci5_moe0012_1', 'nci5_moe0013_1', 'nci5_moe0014_1']

# initialize list of lists holding each ligand and its presence in each receptor
with open('potentiation.txt') as experiment:
    ligands = [[ligand] for ligand in (line.rstrip() for line in experiment)]

for protein in receptors:
    with open(protein + '.txt') as protein_file:
        protein_file_data = protein_file.read()
        for row in ligands:
            # determine if this ligand (row[0]) appears in protein data
            row.append('Found' if row[0] in protein_file_data else 'Not Found')

with open('output.csv', 'wb') as outfile:
    csv_writer = csv.writer(outfile, delimiter='\t')
    csv_writer.writerow(['Ligand'] + receptors)  # header row
    csv_writer.writerows(ligands)

print('output.csv file written')

You can save your result in lists (one list for ligand, one for proteins), after you add the "Protein" and the value of "Ligand" to appropriate list (in 0 index). 在将“蛋白质”和“配体”的值添加到适当的列表(在0索引中)之后,您可以将结果保存在列表中(一个配体列表,一个用于蛋白质)。 After it's easy to save it text file. 之后很容易保存它的文本文件。
For saving you open a file for writing and transform list in string: 为了保存,您可以在字符串中打开用于写入和转换列表的文件:

my_string = " ".join(map(str, lst))

and then save my_string (And do it for each list) 然后保存my_string(并为每个列表执行此操作)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM