如何将变量值临时保留在内存中并在python中进行比较...

Question

Folks, I'm positive that I broke the logic by wrong indentation but now I can't fix it. 伙计们，我很肯定我通过错误的缩进打破了逻辑，但是现在我无法解决它。 Could you please help me? 请你帮助我好吗？
# # analyzeNano.py - analyze XYZ file for 'sanity' # # ＃＃analyticsNano.py-分析XYZ文件中的'sanity'＃＃

import csv
import sys
import os
import getopt

def main():
    '''
analyzeNano.py -d input-directory

analyzeNano.py analyzes a list of XYZ files inside input-directory. It counts for the number of consequitive DNA samples with identical ID and if it between 96 and 110 it treats it as 'good', otherwise 'bad'.
    input-directory    an input directory where XYZ files are located
    -d    flag for input-directory
At the end it creates 2 files: goodNano.csv and badNano.csv
Note: files that are not in goodNano.csv and badNano.csv have no DNA ID and therefore not listed
'''
    try:
        opts, args = getopt.getopt(sys.argv[1:],'d:')
    except getopt.GetoptError, err:
        print str(err)
        help(main)
        sys.exit(2)

    if len(opts) != 1:
        help(main)
        sys.exit(2)

    if not os.path.isdir( sys.argv[2] ):
        print "Error, ", sys.argv[2], " is not a valid directory"
        help(main)
        sys.exit(2)


    prefix = 'dna'
    goodFiles = []
    badFiles = []

    fileList = os.listdir(sys.argv[2])
    for f in fileList:
        absFile = os.path.join(os.path.abspath(sys.argv[2]), f )
        with open(absFile, 'rb') as csvfile:
            # use csv to separate the fields, making it easier to deal with the
            # first value without hard-coding its size
            reader = csv.reader(csvfile, delimiter='\t')
            match = None
            count = 0

            for row in reader:
                # matching rows
                if row[0].lower().startswith(prefix):

                    if match is None:
                        # first line with prefix..
                        match = row[0]

                    if row[0] == match:
                        # found a match, so increment
                        count += 1

                    if row[0] != match:
                        # row prefix has changed
                        if 96 <= count < 110:
                            # counted enough, so start counting the next
                            match = row[0] # match on this now
                            count = 0 # reset the count
                            goodFiles.append(csvfile.name)
                        else:
                            # didn't count enough, so stop working through this file
                            badFiles.append(csvfile.name)
                            break

                # non-matching rows
                else:
                    if match is None:
                        # ignore preceding lines in file
                        continue
                    else:
                        # found non-matching line when expecting a match
                        break
    else:
        if not 96 <= count < 110:
                    #there was at least successful run of lines
            goodFiles.remove(csvfile.name)

    # Create output files
    createFile(goodFiles, 'goodNano')
    createFile(badFiles, 'badNano')

def createFile(files, fName):
    fileName = open( fName + ".csv", "w" )
    for f in files:
        fileName.write( os.path.basename(f) )
        fileName.write("\n")


if __name__ == '__main__':
    main()

Could someone just browse and point me where I broke it? 有人可以浏览一下并指出我在哪里摔坏了吗？

Answer 1

Here's how I would rework your style: 这是我将如何修改您的样式的方法：

with open("z:/file.txt", "rU") as file: # U flag means Universal Newline Mode, 
                                        # if error, try switching back to b
    print(file.name)        
    counter = 0
    for line in file: # iterate over a file object itself line by line
        if line.lower().startswith('dna'): # look for your desired condition
            # process the data
            counter += 1

Answer 2

All variables are held in memory. 所有变量都保存在内存中。 You want to hold onto the most recent match and compare it, counting while it matches: 您要保留最近的匹配并进行比较，并在匹配时计数：

import csv

prefix = 'DNA'

with open('file.txt','rb') as csvfile:
    # use csv to separate the fields, making it easier to deal with the
    # first value without hard-coding its size
    reader = csv.reader(csvfile, delimiter='\t')
    match = None
    count = 0
    is_good = False
    for row in reader:
        # matching rows
        if row[0].startswith(prefix):

            if match is None:
                # first line with prefix..
                match = row[0]

            if row[0] == match:
                # found a match, so increment
                count += 1

            if row[0] != match:
                # row prefix has changed
                if 96 <= count < 100:
                    # counted enough, so start counting the next
                    match = row[0] # match on this now
                    count = 0 # reset the count
                else:
                    # didn't count enough, so stop working through this file
                    break

        # non-matching rows
        else:
            if match is None:
                # ignore preceding lines in file
                continue
            else:
                # found non-matching line when expecting a match
                break
    else:
        if 96 <= count < 100:
            # there was at least successful run of lines
            is_good = True

if is_good:
    print 'File was good'
else:
    print 'File was bad'

Answer 3

From your description, the lines you're interested in match the regular expression: 根据您的描述，您感兴趣的行与正则表达式匹配：

^DNA[0-9]{10}

That is, I assume that your xyz is actually ten digits . 也就是说，我假设您的xyz实际上是十位数。

The strategy here is to match the 13-character string. 此处的策略是匹配13个字符的字符串。 If there's no match, and we haven't previously matched, we keep going without further ado. 如果没有匹配项，并且我们以前没有匹配过，我们将继续努力。 Once we match, we save the string, and increment a counter. 一旦匹配，我们将保存字符串，并增加一个计数器。 As long as we keep matching the regex and the saved string, we keep incrementing. 只要我们不断匹配正则表达式和保存的字符串，我们就会不断增加。 Once we hit a different regex match, or no match at all, the sequence of identical matches is over. 一旦我们碰到了一个不同的正则表达式匹配项，或者根本没有匹配项，相同匹配项的序列就结束了。 If it's valid, we reset the count to zero and the last match to empty. 如果有效，我们将计数重置为零，将最后一个匹配项重置为空。 If it's invalid, we exit. 如果无效，我们退出。

I hasten to add that the following is untested . 我要补充一点，以下内容未经测试 。

# Input file with DNA lines to match:
infile = "z:/file.txt"

# This is the regex for the lines of interest:
regex = re.compile('^DNA[0-9]{10}')

# This will keep count of the number of matches in sequence:
n_seq = 0

# This is the previous match (if any):
lastmatch = ''

# Subroutine to check given sequence count and bail if bad:
def bail_on_bad_sequence(count, match):
    if 96 <= count < 100:
        return
    sys.stderr.write("Bad count (%d) for '%s'\n" % (count,match))
    sys.exit(1)


with open(infile) as file:
    for line in file:
        # Try to match the line to the regex:
        match = re.match(line)

        if match:
            if match.group(0) == lastmatch:
                n_seq += 1
            else:
                bail_on_bad_sequence(lastmatch, n_seq)
                n_seq = 0
                lastmatch = match.group(0)
        else:
            if n_seq != 0:
                bail_on_bad_sequence(lastmatch, n_seq)
                n_seq = 0
                lastmatch = ''

Answer 4

Please ignore my last request to review the code. 请忽略我上次检查代码的要求。 I reviewed it myself and realized that the problem was with formatting. 我自己检查了一下，意识到问题出在格式化上。 It looks that now it works as expected and analyze all files in the directory. 看起来现在它可以按预期工作并分析目录中的所有文件。 Thanks again to Metthew. 再次感谢Metthew。 That help was tremendous. 这种帮助是巨大的。 I still have some concern about accuracy of calculation because in a few cases it failed while it should not ... but I'll investigate it. 我仍然对计算的准确性有些担心，因为在某些情况下它失败了，但它不应该……但我会进行调查。 Overall ... thanks a lot to everyone for tremendous help. 总体而言...非常感谢大家的巨大帮助。

如何将变量值临时保留在内存中并在python中进行比较...

问题描述

4 个解决方案

解决方案1
0 2014-03-14 01:48:53

解决方案2
0 2014-03-14 02:06:45

解决方案3
0 2014-03-14 03:15:49

解决方案4
0 2014-03-15 05:00:33

如何将变量值临时保留在内存中并在python中进行比较...

问题描述

4 个解决方案

解决方案1 0 2014-03-14 01:48:53

解决方案2 0 2014-03-14 02:06:45

解决方案3 0 2014-03-14 03:15:49

解决方案4 0 2014-03-15 05:00:33

解决方案1
0 2014-03-14 01:48:53

解决方案2
0 2014-03-14 02:06:45

解决方案3
0 2014-03-14 03:15:49

解决方案4
0 2014-03-15 05:00:33