简体   繁体   中英

Function does not work properly in for loop - Python

I have a text file in the following format:

AAAAATTTTTT
AAATTTTTTGGG
TTTDDDCCVVVVV

I am trying to calculate the number of occurrences of a character in sequence at start and end of the line.

I have written the following function:

def getStartEnd(sequence):
    start = sequence[0]
    end = sequence[-1]
    startCount = 0
    endCount = 0

    for char in sequence:
        if char == start:
            startCount += 1
            if ( char != start):
                break

    for char in reversed(sequence):
        if char == end:
            endCount += 1
            if ( char != end):
                break

    return startCount, endCount

This function works independently on strings. For eg:

seq = "TTTDDDCCVVVVV"
a,b = getStartEnd(seq)
print a,b

But when I insert in a for loop, it gives the correct value only on the last line of the file.

file = open("Test.txt", 'r')

for line in file:
    a,b = getStartEnd(str(line))
    print a, b

Because lines except the last line, contains newlines.

Try following (strip trailing spaces):

with open("Test.txt", 'r') as f:
    for line in f:
        a, b = getStartEnd(line.rstrip())
        print a, b

BTW, ( char != end ) in the following code is always False. (same for the ( char != start) )

for char in reversed(sequence):
    if char == end:
        endCount += 1
        if ( char != end): # always False because char == end
            break

Do you mean this?

for char in reversed(sequence):
    if char == end:
        endCount += 1
    else:
        break

How about using itertools.takewhile :

import itertools

def getStartEnd(sequence):
    start = sequence[0]
    end = sequence[-1]
    start_count = sum(1 for _ in itertools.takewhile(lambda ch: ch == start, sequence))
    end_count = sum(1 for _ in itertools.takewhile(lambda ch: ch == end, reversed(sequence)))
    return start_count, end_count

Three things. First, in your function, you probably meant to break using the following structure.

for char in sequence:
    if char == start:
        startCount += 1
    else:
        break

for char in reversed(sequence):
    if char == end:
        endCount += 1
    else:
        break

Second, when you are looping through the lines in your file, you don't need to convert the lines to strings with the str function. They already are strings!

Third, the lines include newline characters which are like this: '\\n' They are used to tell the computer when to end a line and start a new one. To get rid of them, you can use the rstrip method of string as follows:

file = open("Test.txt", 'r')

for line in file:
    a,b = getStartEnd(line.rstrip())
    print a, b
file.close()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM