简体   繁体   中英

How can I count the line number between two character in a file with python?

Hi I'm new to python and I have a 3.2 python! I have a file which has some sort of format like this:

Number of segment pairs = 108570; number of pairwise comparisons = 54234
'+' means given segment; '-' means reverse complement

Overlaps            Containments  No. of Constraints Supporting Overlap

******************* Contig 1 ********************

 E_180+

 E_97-

******************* Contig 2 ********************

E_254+

                    E_264+ is in E_254+

E_276+

******************* Contig 3 ********************

E_256-

E_179-

I want to count the number of non-empty lines between the * **** contig# ** * * and I want to get a result like this

contig1=2
contig2=3
contig3=2**

Probably, it's best to use regular expressions here. You can try the following:

import re
str = open(file).read()
pairs = re.findall(r'\*+ (Contig \d+) \*+\n([^*]*)',str)

pairs is a list of tuples, where the tuples have the form ('Contig x', '...') The second component of each tuple contains the text after the mark

Afterwards, you could count the number of '\\n' in those texts; most easily this can be done via a list comprehension:

[(contig, txt.count('\n')) for (contig,txt) in pairs]

(edit: if you don't want to count empty lines you can try:

[(contig, txt.count('\n')-txt.count('\n\n')) for (contig,txt) in pairs]

)

def give(filename):
    with open(filename) as f:
        for line in f:
            if 'Contig' in line:
                category = line.strip('* \r\n')
                break
        cnt = 0
        aim = []
        for line in f:
            if 'Contig' in line:
                yield (category+'='+str(cnt),aim)
                category = line.strip('* \r\n')
                cnt = 0
                aim= []
            elif line.strip():
                cnt+=1
                if 'is in' in line:
                    aim.append(line.strip())
        yield (category+'='+str(cnt),aim)


for a,b in give('input.txt'):
    print a
    if b:  print b

result

Contig 1=2
Contig 2=3
['E_264+ is in E_254+']
Contig 3=2

The function give() isn't a normal function, it is a generator function. See the doc, and if you have question, I will answer.

strip() is a function that eliminates characters at the beginning and at the end of a string

When used without argument, strip() removes the whitespaces (that is to say \\f \\n \\r \\t \\v and blank space ). When there is a string as argument, all the characters present in the string argument that are found in the treated string are removed from the treated string. The order of characters in the string argument doesn't matter: such an argument doesn't designates a string but a set of characters to be removed.

line.strip() is a means to know if there are characters that aren't whitespaces in a line

The fact that elif line.strip(): is situated after the line if 'Contig' in line: , and that it is written elif and not if , is important: if it was the contrary, line.strip() would be True for line being for exemple

******** Contig 2 *********\n

I suppose that you will be interested to know the content of the lines like this one:

            E_264+ is in E_254+

because it is this kind of line that make a difference in the countings So I edited my code in order that the function give() produce also the information of these kind of lines

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM