简体   繁体   English

关于嵌套循环的问题

[英]Question about nested loops

I am new to programming and having some problem figuring out nested loops. 我是编程新手,在解决嵌套循环时遇到一些问题。 I have a list of data that I want to extract from a larger file. 我有一个要从较大文件中提取的数据列表。 I am able to extract one item of data from the larger file successfully but I need to extract 100 different trials from this larger file of thousands of trials. 我能够从较大的文件中成功提取一项数据,但是我需要从数千个试验的较大文件中提取100种不同的试验。 Each trial is one line of data of the larger file. 每个试用版都是较大文件的一行数据。 This is the program I have used to extract one line of data successfully one at a time. 这是我用来一次成功提取一行数据的程序。 In this example it extracts the data for trial 1. It is based off of examples I have seen in prior questions and tutorials. 在此示例中,它提取了试验1的数据。它基于我在先前的问题和教程中看到的示例。 The problem is that I don't need trials 1-100, or any ordered pattern. 问题是我不需要1-100次试用,也不需要任何有序模式。 I need trials 134, 274, 388, etc. It skips around. 我需要试用134、274、388等。它会跳过。 So I don't know how to do a nested loop using the for statement if it doesn't have a range that I can enter. 因此,如果它没有我可以输入的范围,我不知道如何使用for语句进行嵌套循环。 Any help is appreciated. 任何帮助表示赞赏。 Thanks. 谢谢。

completedataset = open('completedataset.txt', 'r')

smallerdataset = open('smallerdataset.txt', 'w')


for line in completedataset:
    if 'trial1' in line: smallerdataset(line)


completedataset.close()
smallerdataset.close()

I'd really like to do it like this: 我真的很想这样:

trials = ('trial12', 'trial23', 'trial34') 试验=('trial12','trial23','trial34')

for line in completedataset: for trial in trials: if trial in line: smallerdataset(line) 用于completedataset中的行:用于试验中的试验:如果用于行中的试验:smalldataset(line)

but this isn't working. 但这不起作用。 Can anyone help me modify this program so that it works correctly? 谁能帮助我修改此程序,使其正常运行?

It seems to me that you need a list that holds all the trial numbers that you are interested in. So maybe you could try something like this: 在我看来,您需要一个列表,其中包含您感兴趣的所有试用编号。因此,也许您可​​以尝试执行以下操作:

completedataset = open('completedataset.txt', 'r')
smallerdataset = open('smallerdataset.txt', 'w')

trials = [134, 274, 388]
completedata = completedataset.readlines()

for t in trials:
    for line in completedata:
        if "trial"+str(t) in line:
            smallerdataset.write(line)
completedataset.close()
smallerdataset.close()

You could just do this: 您可以这样做:

trials = ['trial1', 'trial134', 'trial274']

for line in completedataset:
    for trial in trials:
        if trial in line: smallerdataset(line)

For more efficient operation you could match each line with 'trial[0-9]+' -regex and look up whether the symbol can be found from a set. 为了提高操作效率,您可以将每行与'trial [0-9] +'-regex匹配,并查看是否可以从集中找到该符号。

If each trial in the complete set is a known byte size, you can use file.seek(n) , where n is the byte to start reading at. 如果成套测试中的每个试验都是已知的字节大小,则可以使用file.seek(n) ,其中n是开始读取的字节。 For example, if each line in the file is 3 bytes long, you could do something such as: 例如,如果文件中的每一行长为3个字节,则可以执行以下操作:

myfile = open('file.txt', 'r')
myfile.seek(lineToStartAt * 3)

myfile.readline()#etc

If the number of bytes per line is variable or unknown, you would simply have to read in lines and discard the lines you don't care for (as in KLee1's answer ) 如果每行的字节数是可变的或未知的,则只需读入行并丢弃不需要的行即可(如KLee1的回答

Assuming you know the trials ahead of time, you can do 假设您提前知道试验的内容,则可以

trials = ('trial12', 'trial23', 'trial34')

for line in completedataset:
    for trial in trials:
        if trial in line: smallerdataset(line)

You are going to run into some problems with the way you are specifying your trials. 您在指定试验方式时会遇到一些问题。 If you look for lines containing 'trial1', you will also get lines containing 'trial123'. 如果您查找包含“ trial1”的行,那么您还将获得包含“ trial123”的行。 If you larger dataset is structured in some way you can trying looking for the trial number in a particular field. 如果较大的数据集以某种方式构造,则可以尝试在特定字段中查找试用编号。 For instance, if the data is comma delimited you can make use of the csv package. 例如,如果数据是逗号分隔的,则可以使用csv包。 Finally, using a generator expression instead of the loop will make things a little cleaner. 最后,使用生成器表达式而不是循环将使事情更简洁。 Assuming that the trial number was in the first column of your dataset you could do something like: 假设试验编号在数据集的第一栏中,您可以执行以下操作:

import csv

trials = ['trial134', 'trial1', 'trial56']
data = csv.reader(open('completedataset.txt'))

with open('smalldataset.txt','w') as outf:
    csv.writer(outf).writerows(l for l in data if l[0] in trials)

Assuming you had a function that, looking at a line, is able to tell you whether that line is "desired", the proper structure for your code would be very simple: 假设您有一个函数,看着一行,就可以告诉您该行是否“符合要求”,那么代码的正确结构将非常简单:

with open('completedataset.txt', 'r') as completedataset:
    with open('smallerdataset.txt', 'w') as smallerdataset:
        for line in completedataset:
            if iwantthisone(line):
                smallerdataset.write(line)

The with statements take care of the closing for you. with语句将为您完成结账。 In Python 2.7, you could merge the two with s into one; 在Python 2.7中,您可以将with s的两个合并为一个; in Python 2.5, you need to start your module with a from __future__ import with_statement ; 在Python 2.5中,您需要使用from __future__ import with_statement来启动模块; in Python 2.6, currently the most widespread version, the above code is the right form. 在目前最广泛使用的Python 2.6版本中,以上代码是正确的形式。

So, absolutely everything boils down to that iwantthisone function. 因此,绝对一切都归结为该iwantthisone函数。 You don't tell us anything about the format of your lines, making it impossible for us to help you much further. 您没有告诉我们有关行格式的任何信息,这使我们无法进一步为您提供帮助。 But assume for example that the first word in each line identifies the test, eg test432 ... , and you have the numbers of the tests you want in a set named want_these , eg set([113, 432, 251, ...]) . 但是,例如,假设每行中的第一个单词标识测试,例如test432 ... ,并且您在名为want_these的集合中拥有想要的测试want_these ,例如set([113, 432, 251, ...]) want_these set([113, 432, 251, ...]) Then, a very simple way to write iwantthisone might be: 然后,编写iwantthisone一种非常简单的方法可能是:

def iwantthisone(line):
    firstword = line.split(None, 1)[0]
    testnumber = int(firstword[4:])
    return testnumber in want_these

The proper contents of iwantthisone entirely depend on your lines' format and how do you tell what lines you actually do want to keep, of course. 的正确内容iwantthisone完全取决于你的线条的格式,你如何知道其实想保持当然,什么线路。 But I hope this general structure still helps. 但我希望这种总体结构仍然有帮助。

Note that there are really no nested loops in this general structure I recommend!-) 请注意,我建议的这种常规结构中实际上没有嵌套循环!-)

关于您在注释中显示的错误消息:行继续符是一个反斜杠,因此它告诉您在该行中某个地方有一个错误的反斜杠字符。

Assuming the lines always start with the trial identifier you can use the startswith function and filter to pull out the ones you want. 假设行始终以试验标识符开头,则可以使用startswith函数和过滤器来提取所需的标识符。

completedataset = open('completedataset.txt', 'r')
smallerdataset = open('smallerdataset.txt', 'w')

wantedtrials = ('trial134', 'trial274', 'trial388')

for line in completedataset:
    if filter(line.startswith, wantedtrials):
        smallerdataset.write(line)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM