简体   繁体   中英

Question about nested loops

I am new to programming and having some problem figuring out nested loops. I have a list of data that I want to extract from a larger file. I am able to extract one item of data from the larger file successfully but I need to extract 100 different trials from this larger file of thousands of trials. Each trial is one line of data of the larger file. This is the program I have used to extract one line of data successfully one at a time. In this example it extracts the data for trial 1. It is based off of examples I have seen in prior questions and tutorials. The problem is that I don't need trials 1-100, or any ordered pattern. I need trials 134, 274, 388, etc. It skips around. So I don't know how to do a nested loop using the for statement if it doesn't have a range that I can enter. Any help is appreciated. Thanks.

completedataset = open('completedataset.txt', 'r')

smallerdataset = open('smallerdataset.txt', 'w')


for line in completedataset:
    if 'trial1' in line: smallerdataset(line)


completedataset.close()
smallerdataset.close()

I'd really like to do it like this:

trials = ('trial12', 'trial23', 'trial34')

for line in completedataset: for trial in trials: if trial in line: smallerdataset(line)

but this isn't working. Can anyone help me modify this program so that it works correctly?

It seems to me that you need a list that holds all the trial numbers that you are interested in. So maybe you could try something like this:

completedataset = open('completedataset.txt', 'r')
smallerdataset = open('smallerdataset.txt', 'w')

trials = [134, 274, 388]
completedata = completedataset.readlines()

for t in trials:
    for line in completedata:
        if "trial"+str(t) in line:
            smallerdataset.write(line)
completedataset.close()
smallerdataset.close()

You could just do this:

trials = ['trial1', 'trial134', 'trial274']

for line in completedataset:
    for trial in trials:
        if trial in line: smallerdataset(line)

For more efficient operation you could match each line with 'trial[0-9]+' -regex and look up whether the symbol can be found from a set.

If each trial in the complete set is a known byte size, you can use file.seek(n) , where n is the byte to start reading at. For example, if each line in the file is 3 bytes long, you could do something such as:

myfile = open('file.txt', 'r')
myfile.seek(lineToStartAt * 3)

myfile.readline()#etc

If the number of bytes per line is variable or unknown, you would simply have to read in lines and discard the lines you don't care for (as in KLee1's answer )

Assuming you know the trials ahead of time, you can do

trials = ('trial12', 'trial23', 'trial34')

for line in completedataset:
    for trial in trials:
        if trial in line: smallerdataset(line)

You are going to run into some problems with the way you are specifying your trials. If you look for lines containing 'trial1', you will also get lines containing 'trial123'. If you larger dataset is structured in some way you can trying looking for the trial number in a particular field. For instance, if the data is comma delimited you can make use of the csv package. Finally, using a generator expression instead of the loop will make things a little cleaner. Assuming that the trial number was in the first column of your dataset you could do something like:

import csv

trials = ['trial134', 'trial1', 'trial56']
data = csv.reader(open('completedataset.txt'))

with open('smalldataset.txt','w') as outf:
    csv.writer(outf).writerows(l for l in data if l[0] in trials)

Assuming you had a function that, looking at a line, is able to tell you whether that line is "desired", the proper structure for your code would be very simple:

with open('completedataset.txt', 'r') as completedataset:
    with open('smallerdataset.txt', 'w') as smallerdataset:
        for line in completedataset:
            if iwantthisone(line):
                smallerdataset.write(line)

The with statements take care of the closing for you. In Python 2.7, you could merge the two with s into one; in Python 2.5, you need to start your module with a from __future__ import with_statement ; in Python 2.6, currently the most widespread version, the above code is the right form.

So, absolutely everything boils down to that iwantthisone function. You don't tell us anything about the format of your lines, making it impossible for us to help you much further. But assume for example that the first word in each line identifies the test, eg test432 ... , and you have the numbers of the tests you want in a set named want_these , eg set([113, 432, 251, ...]) . Then, a very simple way to write iwantthisone might be:

def iwantthisone(line):
    firstword = line.split(None, 1)[0]
    testnumber = int(firstword[4:])
    return testnumber in want_these

The proper contents of iwantthisone entirely depend on your lines' format and how do you tell what lines you actually do want to keep, of course. But I hope this general structure still helps.

Note that there are really no nested loops in this general structure I recommend!-)

关于您在注释中显示的错误消息:行继续符是一个反斜杠,因此它告诉您在该行中某个地方有一个错误的反斜杠字符。

Assuming the lines always start with the trial identifier you can use the startswith function and filter to pull out the ones you want.

completedataset = open('completedataset.txt', 'r')
smallerdataset = open('smallerdataset.txt', 'w')

wantedtrials = ('trial134', 'trial274', 'trial388')

for line in completedataset:
    if filter(line.startswith, wantedtrials):
        smallerdataset.write(line)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM