简体   繁体   English

如何仅在python中使用文件的一部分?

[英]How to only use part of a file in python?

So I have been trying to use conditions to only print a section of a file but for some reason when I run the code in ipython is just continually runs and never stops. 因此,我一直试图使用条件来仅打印文件的一部分,但是由于某些原因,当我在ipython中运行代码时,它将不断运行并且永不停止。

The file I am running it on is: 我正在运行的文件是:

Use the -noinfo option to turn off this help.
Use the -help option to get a list of command line options.

pilercr v1.06
By Robert C. Edgar

Temp1.None.fasta: 523 putative CRISPR arrays found.



DETAIL REPORT



Array 1
>contig-856000000 902 nucleotides

       Pos  Repeat     %id  Spacer  Left flank    Repeat                                      Spacer
==========  ======  ======  ======  ==========    ========================================    ======
        28      40    95.0      26  TGCTTCCCCG    -.....................................T.    CTTGGTCTTGCTGGTTCTCACCGACT
        94      40    95.0      25  CTCACCGACT    .T....................................C.    GTCAGCGTGTAGCGACTGTATCTGG
       159      40   100.0          CTGTATCTGG    ........................................    TTGCTCGAA
==========  ======  ======  ======  ==========    ========================================
         3      40              25                TAGTTGTGAATAGCTGACAAAATCATATCATATACAACAG


Array 2
>contig-2277000000 590 nucleotides

       Pos  Repeat     %id  Spacer  Left flank    Repeat                                   Spacer
==========  ======  ======  ======  ==========    =====================================    ======
        19      37   100.0      37  GAGGGTGAGG    .....................................    ACTTTAGGTTCAAATCCGTAGAGCTGATCTGTAATAG
        93      37   100.0      37  TCTGTAATAG    .....................................    ATTCCGTTGTTGAAATAAAGTATGAATAATATTTGGT
       167      37   100.0      35  AATATTTGGT    .....................................    TTCTCGAACGTTCCATGCTTCATAATATACCTCCT
       239      37   100.0      39  TATACCTCCT    .....................................    CTGATGAATCTTACCTCGTACAGTGATGTAGCCAGGTAA
       315      37   100.0          AGCCAGGTAA    .....................................    CGTCAGTCATG
==========  ======  ======  ======  ==========    =====================================
         5      37              37                GTAGAAATGAGACGTCCGCTGTAAAGGACATTGATAC


Array 3
>contig-2766000000 540 nucleotides

       Pos  Repeat     %id  Spacer  Left flank    Repeat                                   Spacer
==========  ======  ======  ======  ==========    =====================================    ======
       172      37   100.0      29  GTTTTAGATG    .....................................    TATCGTAGCATCCCACTCCCCTGGTGTAA
       238      37   100.0      29  CCTGGTGTAA    .....................................    GTTGGACGCGCTGCTGGACGATAGGCTGC
       304      37    97.3      29  GATAGGCTGC    T....................................    ACGCCTTACAAGCTGACCCGCGCCCAATT
       370      37   100.0          GCGCCCAATT    .....................................    GTACCTTGTTC
==========  ======  ======  ======  ==========    =====================================
         4      37              29                GGCTGTAAAAAGCCACCAAAATGATGGTAATTACAAG


SUMMARY BY SIMILARITY



Array          Sequence    Position      Length  # Copies  Repeat  Spacer  +  Consensus
=====  ================  ==========  ==========  ========  ======  ======  =  =========
    5  contig-504300000          18         364         6      33      33  +  --------------------------GTCGCT-C---CCCGCATGGGGAGCG--T-GGATTGAAAT-----
    8  contig-974700000          15         229         4      32      33  -  --------------------------GTCGCC-C---CCCATGCG-GGGGCG--T-GGATTGAAAC-----
   12  contig-759000001         464         503         8      33      34  +  --------------------------GTCGCT-C---CCTTTACGGGGAGCG--T-GGATTGAAAT-----
   16  contig-293000000          77         406         6      37      36  -  -----------------------GTAGAAATGAG---TTCCCCGATGAGAAG--G-GGATTGACAC-----
   17  contig-457600000          28         416         6      37      38  -  -----------------------GTAGAAATGGG---TGTCCCGATAGATAG--G-GGATTGACAC-----
   18  contig-527300000           1         351         6      33      32  +  -----------------------ATCGCG----C---CCCCACGGGGGCGTG--T-GAATTGAAAC-----
   27  contig-132220000          21         234         4      33      34  +  --------------------------GTCGCT-C---CCTTCACGGGGAGCG--T-GGATTGAAAT-----
   36  contig-602400000          35         304         5      33      34  -  --------------------------GTCGCC-C---CCCACGTGGGGGGCG--T-GGATTGAAAC-----
   38  contig-124860000         131         232         4      32      34  +  --------------------------GTCGCA-C---CCCTCGC-GGGTGCG--T-GGATTGAAAC-----
   54  contig-979400000         138         231         4      32      34  -  --------------------------GTCGCC-C---CTCTTGCA-GGGGCG--T-GGATTGAAAC-----
   61  contig-992000005         149         693        11      30      36  -  --------------------GTTAAAATCA--GA---CC---ATTTTG--------GGATTGAAAT-----
   68  contig-103110000          37         238         4      34      34  +  -----------------------GTCGTC----C---CCCACACGGGGGACG--T-GGATTGAAATA----
   73  contig-372900000        1627        1013        16      30      35  +  ----------------------------ATTAGAATCGTACTT--ATGTAGAATTGAAAT-----------

And my code so far is: 到目前为止,我的代码是:

fname = 'crispr_pilrcr_1.out'
start=False
end=False
counter = 0
for line in open(fname, 'r'): # Open up the file
    s = line.split() # Split each line into words
    if not s: continue # Remove empty lines which would otherwise cause errors
    if '==' in s[0]: continue # Removes seperation lines which consist of long '=======' strings 
    try:
        if s[0] == 'DETAIL': # Only start in the section which starts with 'DETAIL'
            start=True
            print 'Starting'
        if s[0] == 'SUMMARY': # Only end once this section has ended
            end=True
            print 'Ending'
        while start==True or end==False: # Whilst in the section of the PILER-CR output which provides spacer sequences 
            try:
                int(s[0])
                print s[7]
            except ValueError:
                continue
    except ValueError:
        continue

I figure there is likely something wrong with the 'while' loop however the same continual running occured when I used 'and' instead of 'or'. 我认为“ while”循环可能存在问题,但是当我使用“ and”而不是“ or”时,会发生相同的连续运行。

As I said I want to select the part of the file between 'DETAIL REPORT' and 'SUMMARY BY SIMILARITY' hence why I set the conditions to try once they are found. 正如我说过的,我想在“详细报告”和“通过相似性总结”之间选择文件的一部分,因此为什么我设置一旦发现就会尝试的条件。

Any help you guys can provide would be great. 你们可以提供的任何帮助都会很棒。

Thanks, Tom 谢谢汤姆

Consider something like 考虑类似

fname = 'crispr_pilrcr_1.out'
counter = 0
printing = False
for line in open(fname, 'r'): # Open up the file
    s = line.split() # Split each line into words
    if not s: continue # Remove empty lines which would otherwise cause errors
    if '==' in s[0]: continue # Removes seperation lines which consist of long '=======' strings 
    try:
        if s[0] == 'DETAIL': # Only start in the section which starts with 'DETAIL'
            printing = True
            print 'Starting'
        elif s[0] == 'SUMMARY': # Only end once this section has ended
            printing = False
            print 'Ending'
        elif printing:
            try:
                # Anything you put here will only be called for the lines
                #   between DETAIL... and SUMMARY...
            except ValueError:
                continue
    except ValueError:
        continue

Basically, you're using a single variable printing which is initialized to False, set to True when the for loop encounters "DETAIL..." and reset to False when the for loop encounters "SUMMARY...". 基本上,您使用的是单个变量printing ,该变量的初始化为False,在for循环遇到“ DETAIL ...”时设置为True,在for循环遇到“ SUMMARY ...”时重置为False。

For the lines that don't match "DETAIL..." or "SUMMARY...", and if printing is True (ie for the lines between the two headings), your try block will be executed. 对于与“ DETAIL ...”或“ SUMMARY ...”不匹配的行,并且如果printing为True(即,对于两个标题之间的行),则将执行try块。

The problem is that you never change the values of start or end inside your while loop. 问题是您永远不会在while循环内更​​改startend的值。 So, whatever values they had that allowed you to get into the loop will be the same on every iteration. 因此,无论它们具有什么允许您进入循环的值,每次迭代都相同。

Without completely overhauling your logic, I'd guess that you probably want to do something like: 在不彻底改革逻辑的情况下,我想您可能想做些类似的事情:

while start or not end:
    try:
        int(s[0])
        print s[7]
    except ValueError:
        end = True
        start = False

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM