简体   繁体   English

避免 Python 代码中的代码重复

[英]avoiding code duplication in Python code

Consider the following Python snippet:考虑以下 Python 片段:

af=open("a",'r')
bf=open("b", 'w')

for i, line in enumerate(af):
    if i < K:
        bf.write(line)

Now, suppose I want to handle the case where K is None , so the writing continues to the end of the file.现在,假设我要处理KNone的情况,因此写入继续到文件末尾。 I'm currently doing我目前在做

if K is None:
    for i, line in enumerate(af):
        bf.write(line)
else:
    for i, line in enumerate(af):            
        bf.write(line)
        if i==K:
            break

This clearly isn't the best way to handle this, as I'm duplicating the code.这显然不是处理此问题的最佳方法,因为我正在复制代码。 Is there some more integrated way I can handle this?有没有更综合的方法可以处理这个问题? The natural thing would be to have the if/break code only be present if K is not None , but this involves writing syntax on the fly a la Lisp macros, which Python can't really do.很自然的事情是只有在K不是None时才存在if/break代码,但这涉及像 Lisp 宏那样即时编写语法,这是 Python 不能真正做到的。 Just to be clear, I'm not concerned about the particular case (which I choose partly for its simplicity), so much as learning about general techniques I may not be familar with.为了清楚起见,我并不关心特定案例(我选择它的部分原因是为了它的简单性),而是学习我可能不熟悉的一般技术。

UPDATE: After reading answers people have posted, and doing more experimentation, here are some more comments.更新:阅读人们发布的答案并进行更多实验后,这里有更多评论。

As said above, I was looking for general techniques that would be generalizable, and I think @Paul's answer,namely using takewhile from iterrools , fits that best.如上所述,我一直在寻找可以推广的通用技术,我认为@Paul 的回答,即使用takewhile中的iterrools最适合。 As a bonus, it is also much faster than the naive method i listed above;作为奖励,它也比我上面列出的简单方法快得多; I'm not sure why.我不确定为什么。 I'm not really familar with itertools , though I've looked at it a few times.我对itertools不是很熟悉,尽管我已经看过几次了。 From my perspective this is a case of functional programming For The Win , (Amusingly, the author of itertools once asked for feedback about dropping takewhile . See the thread beginning http://mail.python.org/pipermail/python-list/2007-December/522529.html .) I'd simplified my situation above, the actual situation is a bit more messy - I'm writing to two different files in the loop.从我的角度来看,这是一个For The Win函数式编程的案例,(有趣的是, itertools的作者曾经询问过关于放弃takewhile的反馈。请参阅开头的线程http://mail.python.org/pipermail/python-list/2007 -December/522529.html 。)我在上面简化了我的情况,实际情况有点混乱 - 我在循环中写入两个不同的文件。 So the code looks more like:所以代码看起来更像:

for i, line in enumerate(af):
    if i < K:
        bf.write(line)
        cf.write(line.split(',')[0].strip('"')+'\n')

Given my posted example, @Jeff reasonably suggested that in the case when K was None , I just copy the file.鉴于我发布的示例,@Jeff 合理地建议在KNone的情况下,我只需复制文件。 Since in practice I am looping anyway, doing so is not such a clear choice.因为在实践中我无论如何都在循环,这样做并不是一个明确的选择。 However, takewhile generalizes painlessly to this case.但是, takewhile可以轻松地将这种情况推广到这种情况。 I also had another use case I did not mention here, and was able to use takewhile there too, which was nice.我还有另一个在这里没有提到的用例,也可以在那里使用takewhile ,这很好。 The second example looks like (verbatim)第二个例子看起来像(逐字)

i=0
for line in takewhile(illuminacond, af):
    line_split=line.split(',')
    pid=line_split[1][0:3]
    out = line_split[1] + ',' + line_split[2] + ',' + line_split[3][1] + line_split[3][3] + ',' \
                        + line_split[15] + ',' + line_split[9] + ',' + line_split[10]
    if pid!='cnv' and pid!='hCV' and pid!='cnv':
        i = i+1
        of.write(out.strip('"')+'\n')
        tf.write(line)

here I was able to use the condition在这里我可以使用条件

if K is None:
    illuminacond = lambda x: x.split(',')[0] != '[Controls]'
else:
    illuminacond = lambda x: x.split(',')[0] != '[Controls]' and i < K

per @Paul's original example.根据@Paul 的原始示例。 However, I'm not completely happy about the fact that I'm getting i from the outer scope, though the code works.然而,我对我从外部 scope 得到i的事实并不完全满意,尽管代码有效。 Is there a better way of doing this?有没有更好的方法来做到这一点? Or maybe it should be a separate question.或者也许它应该是一个单独的问题。 Anyway, thanks to everyone who answered my question.无论如何,感谢所有回答我问题的人。 Honorable mention to @Jeff, who made some nice suggestions.对@Jeff 的荣誉提及,他提出了一些很好的建议。

for i, line in enumerate(af):  
    if K is None or i < K:
        bf.write(line)
    else:
        break

itertools.takewhile will apply your condition, and then break out of the loop the first time the condition fails. itertools.takewhile将应用您的条件,然后在条件第一次失败时跳出循环。

from itertools import takewhile

if K is None:
    condition = lambda x: True
else:
    condition = lambda x: x[0] < K

for i,line in takewhile(condition, enumerate(af)):
    bf.write(line)

If K is None, then you don't want takewhile to ever stop, so the condition function should always return True.如果 K 为 None,那么您不希望 takewhile 停止,因此条件 function 应始终返回 True。 But if you are given a numeric value for K, then once the 0'th element of the tuple passed to the condition >= K, then takewhile will stop.但是如果给你一个 K 的数值,那么一旦元组的第 0 个元素传递给条件 >= K,那么 takewhile 就会停止。

If you must loop, how about this?如果你必须循环,这个怎么样?

from sys import maxint

limit = K or maxint
for i, line in enumerate(af):
    if i >= limit: break
    bf.write(line)

Or even this?甚至这个?

from itertools import islice
from sys import maxint

bf.writelines(islice(af, K or maxint))

Why loop at all in the case that K is None ?KNone的情况下为什么要循环?

from shutil import copyfile

aname = 'a' bname = 'b' if K is None: copyfile(aname, bname) else: af = open(aname, 'r') bf = open(bname, 'w') for i, line in enumerate(af): if i < K: bf.write(line)

Whatever K is, it's always going to be less than infinity.无论 K 是什么,它总是小于无穷大。

if K is None:
    K = float('inf') # infinity

for i, line in enumerate(af):            
    bf.write(line)
    if i==K:
        break

Or, setting K = -1 works just as well, though it's less semantically correct.或者,设置K = -1也可以,尽管它在语义上不太正确。 Ideally you would set K = max lines in af, but I presume that data is not cheaply available.理想情况下,您会在 af 中设置 K = max lines,但我认为数据并不便宜。

I think you're in a situation where you are going to have to accept a trade off between DRY principles and optimizations.我认为您处于必须接受 DRY 原则和优化之间的权衡的情况。

I would start by staying true to DRY principles and remove the duplicate code with a function like write_until ...我将首先坚持 DRY 原则,并使用 function (如write_until ...

def write_until(file_in,file_out,break_on)
    for i,line in enumerate(file_in)

        if break_on(i,line):
            break
        else:
            file_out.write(line)

af=open("a",'r')
bf=open("b", 'w')

if K is None:
    write_until(af,bf,lambda i,line: False)
else:
    write_until(af,bf,lambda i,line: i>K)

Then actually use the code and see if you really need to do optimizations.然后实际使用代码,看看你是否真的需要做优化。 How much performance improvement will you honestly see from removing an if False check?从删除if False检查中,您会真正看到多少性能改进? If you really need that extra speed boost (which I doubt) then you'll just have to live with some code duplication.如果你真的需要额外的速度提升(我怀疑),那么你只需要忍受一些代码重复。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM