[英]How to only use part of a file in python?
So I have been trying to use conditions to only print a section of a file but for some reason when I run the code in ipython is just continually runs and never stops. 因此,我一直试图使用条件来仅打印文件的一部分,但是由于某些原因,当我在ipython中运行代码时,它将不断运行并且永不停止。
The file I am running it on is: 我正在运行的文件是:
Use the -noinfo option to turn off this help.
Use the -help option to get a list of command line options.
pilercr v1.06
By Robert C. Edgar
Temp1.None.fasta: 523 putative CRISPR arrays found.
DETAIL REPORT
Array 1
>contig-856000000 902 nucleotides
Pos Repeat %id Spacer Left flank Repeat Spacer
========== ====== ====== ====== ========== ======================================== ======
28 40 95.0 26 TGCTTCCCCG -.....................................T. CTTGGTCTTGCTGGTTCTCACCGACT
94 40 95.0 25 CTCACCGACT .T....................................C. GTCAGCGTGTAGCGACTGTATCTGG
159 40 100.0 CTGTATCTGG ........................................ TTGCTCGAA
========== ====== ====== ====== ========== ========================================
3 40 25 TAGTTGTGAATAGCTGACAAAATCATATCATATACAACAG
Array 2
>contig-2277000000 590 nucleotides
Pos Repeat %id Spacer Left flank Repeat Spacer
========== ====== ====== ====== ========== ===================================== ======
19 37 100.0 37 GAGGGTGAGG ..................................... ACTTTAGGTTCAAATCCGTAGAGCTGATCTGTAATAG
93 37 100.0 37 TCTGTAATAG ..................................... ATTCCGTTGTTGAAATAAAGTATGAATAATATTTGGT
167 37 100.0 35 AATATTTGGT ..................................... TTCTCGAACGTTCCATGCTTCATAATATACCTCCT
239 37 100.0 39 TATACCTCCT ..................................... CTGATGAATCTTACCTCGTACAGTGATGTAGCCAGGTAA
315 37 100.0 AGCCAGGTAA ..................................... CGTCAGTCATG
========== ====== ====== ====== ========== =====================================
5 37 37 GTAGAAATGAGACGTCCGCTGTAAAGGACATTGATAC
Array 3
>contig-2766000000 540 nucleotides
Pos Repeat %id Spacer Left flank Repeat Spacer
========== ====== ====== ====== ========== ===================================== ======
172 37 100.0 29 GTTTTAGATG ..................................... TATCGTAGCATCCCACTCCCCTGGTGTAA
238 37 100.0 29 CCTGGTGTAA ..................................... GTTGGACGCGCTGCTGGACGATAGGCTGC
304 37 97.3 29 GATAGGCTGC T.................................... ACGCCTTACAAGCTGACCCGCGCCCAATT
370 37 100.0 GCGCCCAATT ..................................... GTACCTTGTTC
========== ====== ====== ====== ========== =====================================
4 37 29 GGCTGTAAAAAGCCACCAAAATGATGGTAATTACAAG
SUMMARY BY SIMILARITY
Array Sequence Position Length # Copies Repeat Spacer + Consensus
===== ================ ========== ========== ======== ====== ====== = =========
5 contig-504300000 18 364 6 33 33 + --------------------------GTCGCT-C---CCCGCATGGGGAGCG--T-GGATTGAAAT-----
8 contig-974700000 15 229 4 32 33 - --------------------------GTCGCC-C---CCCATGCG-GGGGCG--T-GGATTGAAAC-----
12 contig-759000001 464 503 8 33 34 + --------------------------GTCGCT-C---CCTTTACGGGGAGCG--T-GGATTGAAAT-----
16 contig-293000000 77 406 6 37 36 - -----------------------GTAGAAATGAG---TTCCCCGATGAGAAG--G-GGATTGACAC-----
17 contig-457600000 28 416 6 37 38 - -----------------------GTAGAAATGGG---TGTCCCGATAGATAG--G-GGATTGACAC-----
18 contig-527300000 1 351 6 33 32 + -----------------------ATCGCG----C---CCCCACGGGGGCGTG--T-GAATTGAAAC-----
27 contig-132220000 21 234 4 33 34 + --------------------------GTCGCT-C---CCTTCACGGGGAGCG--T-GGATTGAAAT-----
36 contig-602400000 35 304 5 33 34 - --------------------------GTCGCC-C---CCCACGTGGGGGGCG--T-GGATTGAAAC-----
38 contig-124860000 131 232 4 32 34 + --------------------------GTCGCA-C---CCCTCGC-GGGTGCG--T-GGATTGAAAC-----
54 contig-979400000 138 231 4 32 34 - --------------------------GTCGCC-C---CTCTTGCA-GGGGCG--T-GGATTGAAAC-----
61 contig-992000005 149 693 11 30 36 - --------------------GTTAAAATCA--GA---CC---ATTTTG--------GGATTGAAAT-----
68 contig-103110000 37 238 4 34 34 + -----------------------GTCGTC----C---CCCACACGGGGGACG--T-GGATTGAAATA----
73 contig-372900000 1627 1013 16 30 35 + ----------------------------ATTAGAATCGTACTT--ATGTAGAATTGAAAT-----------
And my code so far is: 到目前为止,我的代码是:
fname = 'crispr_pilrcr_1.out'
start=False
end=False
counter = 0
for line in open(fname, 'r'): # Open up the file
s = line.split() # Split each line into words
if not s: continue # Remove empty lines which would otherwise cause errors
if '==' in s[0]: continue # Removes seperation lines which consist of long '=======' strings
try:
if s[0] == 'DETAIL': # Only start in the section which starts with 'DETAIL'
start=True
print 'Starting'
if s[0] == 'SUMMARY': # Only end once this section has ended
end=True
print 'Ending'
while start==True or end==False: # Whilst in the section of the PILER-CR output which provides spacer sequences
try:
int(s[0])
print s[7]
except ValueError:
continue
except ValueError:
continue
I figure there is likely something wrong with the 'while' loop however the same continual running occured when I used 'and' instead of 'or'. 我认为“ while”循环可能存在问题,但是当我使用“ and”而不是“ or”时,会发生相同的连续运行。
As I said I want to select the part of the file between 'DETAIL REPORT' and 'SUMMARY BY SIMILARITY' hence why I set the conditions to try once they are found. 正如我说过的,我想在“详细报告”和“通过相似性总结”之间选择文件的一部分,因此为什么我设置一旦发现就会尝试的条件。
Any help you guys can provide would be great. 你们可以提供的任何帮助都会很棒。
Thanks, Tom 谢谢汤姆
Consider something like 考虑类似
fname = 'crispr_pilrcr_1.out'
counter = 0
printing = False
for line in open(fname, 'r'): # Open up the file
s = line.split() # Split each line into words
if not s: continue # Remove empty lines which would otherwise cause errors
if '==' in s[0]: continue # Removes seperation lines which consist of long '=======' strings
try:
if s[0] == 'DETAIL': # Only start in the section which starts with 'DETAIL'
printing = True
print 'Starting'
elif s[0] == 'SUMMARY': # Only end once this section has ended
printing = False
print 'Ending'
elif printing:
try:
# Anything you put here will only be called for the lines
# between DETAIL... and SUMMARY...
except ValueError:
continue
except ValueError:
continue
Basically, you're using a single variable printing
which is initialized to False, set to True when the for loop encounters "DETAIL..." and reset to False when the for loop encounters "SUMMARY...". 基本上,您使用的是单个变量
printing
,该变量的初始化为False,在for循环遇到“ DETAIL ...”时设置为True,在for循环遇到“ SUMMARY ...”时重置为False。
For the lines that don't match "DETAIL..." or "SUMMARY...", and if printing
is True (ie for the lines between the two headings), your try
block will be executed. 对于与“ DETAIL ...”或“ SUMMARY ...”不匹配的行,并且如果
printing
为True(即,对于两个标题之间的行),则将执行try
块。
The problem is that you never change the values of start
or end
inside your while loop. 问题是您永远不会在while循环内更改
start
或end
的值。 So, whatever values they had that allowed you to get into the loop will be the same on every iteration. 因此,无论它们具有什么允许您进入循环的值,每次迭代都相同。
Without completely overhauling your logic, I'd guess that you probably want to do something like: 在不彻底改革逻辑的情况下,我想您可能想做些类似的事情:
while start or not end:
try:
int(s[0])
print s[7]
except ValueError:
end = True
start = False
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.