簡體   English   中英

以略有不同的功能對代碼進行重復數據刪除

[英]Deduplicating code in slightly different functions

我有兩個非常相似的循環,這兩個循環包含一個內部循環,該循環與第三個循環非常相似(eh ... :))。 用代碼說明,它看起來很像這樣:

# First function
def fmeasure_kfold1(array, nfolds):
    ret = []

    # Kfold1 and kfold2 both have this outer loop
    for train_index, test_index in KFold(len(array), nfolds):
        correlation = analyze(array[train_index])

        for build in array[test_index]:  # <- All functions have this loop

            # Retrieved tests is calculated inside the build loop in kfold1
            retrieved_tests = get_tests(set(build['modules']), correlation)

            relevant_tests = set(build['tests'])
            fval = calc_f(relevant_tests, retrieved_tests)
            if fval is not None:
                ret.append(fval)

    return ret

# Second function
def fmeasure_kfold2(array, nfolds):
    ret = []

    # Kfold1 and kfold2 both have this outer loop
    for train_index, test_index in KFold(len(array), nfolds):
        correlation = analyze(array[train_index])

        # Retrieved tests is calculated outside the build loop in kfold2
        retrieved_tests = _sum_tests(correlation)

        for build in array[test_index]:  # <- All functions have this loop

            relevant_tests = set(build['tests'])
            fval = calc_f(relevant_tests, retrieved_tests)
            if fval is not None:
                ret.append(fval)

    return ret

# Third function
def fmeasure_all(array):
    ret = []
    for build in array:  # <- All functions have this loop

        relevant = set(build['tests'])
        fval = calc_f2(relevant)  # <- Instead of calc_f, I call calc_f2
        if fval is not None:
            ret.append(fval)

    return ret

前兩個函數僅在方式上有所不同,並且在什么時候計算retrieved_tests 第三個功能由前兩個函數的內部循環中,它調用不同calc_f2 ,並且不使用retrieved_tests

實際上,代碼更加復雜,但是盡管重復使我感到煩惱,但我認為我可以接受它。 但是,最近我一直在對其進行更改,不得不一次在兩個或三個位置進行更改很煩人。

有沒有很好的方法來合並重復的代碼? 我想到的唯一方法就是引入類,它引入了許多樣板,並且我希望盡可能將這些函數保留為純函數。


編輯

這是calc_fcalc_f2的內容:

def calc_f(relevant, retrieved):
    """Calculate the F-measure given relevant and retrieved tests."""
    recall = len(relevant & retrieved)/len(relevant)
    prec = len(relevant & retrieved)/len(retrieved)
    fmeasure = f_measure(recall, prec)

    return (fmeasure, recall, prec)


def calc_f2(relevant, nbr_tests=1000):
    """Calculate the F-measure given relevant tests."""
    recall = 1
    prec = len(relevant) / nbr_tests
    fmeasure = f_measure(recall, prec)

    return (fmeasure, recall, prec)

f_measure計算精度和f_measure諧波平均值

基本上,由於不需要檢索到的測試,因此calc_f2具有許多快捷方式。

具有一個帶有一個額外參數的通用函數,該參數可以控制在何處計算retrieved_tests

例如

def fmeasure_kfold_generic(array, nfolds, mode):
    ret = []

    # Kfold1 and kfold2 both have this outer loop
    for train_index, test_index in KFold(len(array), nfolds):
        correlation = analyze(array[train_index])

        # Retrieved tests is calculated outside the build loop in kfold2
        if mode==2:
            retrieved_tests = _sum_tests(correlation)

        for build in array[test_index]:  # <- All functions have this loop
            # Retrieved tests is calculated inside the build loop in kfold1
            if mode==1:
                retrieved_tests = get_tests(set(build['modules']), correlation)

            relevant_tests = set(build['tests'])
            fval = calc_f(relevant_tests, retrieved_tests)
            if fval is not None:
                ret.append(fval)

一種方法是將每個內部循環都編寫為一個函數,然后將外部循環作為單獨的函數接收其他的作為參數。 這與排序函數(接收用於比較兩個元素的函數)的功能很接近。

當然,困難的部分是找到所有功能之間的共同部分到底是什么,這並不總是那么簡單。

典型的解決方案是識別算法的各個部分,並使用Template方法設計模式 ,其中在子類中實現不同的階段。 我根本不理解您的代碼,但是我假設會有諸如makeGlobalRetrievedTests()makeIndividualRetrievedTests()

我會由內而外地解決問題:通過分解最內層的循環。 這與“函數式”樣式(以及“函數式編程”)一起很好地工作。 在我看來,如果您概括fmeasure_all ,就可以實現這三個功能。 就像是

def fmeasure(builds, calcFn, retrieveFn):
    ret = []
    for build in array:
        relevant = set(build['tests'])
        fval = calcFn(relevant, retrieveFn(build))
        if fval is not None:
            ret.append(fval)

    return ret

這使您可以定義:

def fmeasure_kfold1(array, nfolds):
    ret = []

    # Kfold1 and kfold2 both have this outer loop
    for train_index, test_index in KFold(len(array), nfolds):
        correlation = analyze(array[train_index])

        ret += fmeasure(array[test_index], calc_f,
                        lambda build: get_tests(set(build['modules']), correlation))

    return ret


def fmeasure_kfold2(array, nfolds):
    ret = []

    # Kfold1 and kfold2 both have this outer loop
    for train_index, test_index in KFold(len(array), nfolds):
        correlation = analyze(array[train_index])

        # Retrieved tests is calculated outside the build loop in kfold2
        retrieved_tests = _sum_tests(correlation)

        ret += fmeasure(array[test_index], calc_f, lambda _: retrieved_tests)

    return ret


def fmeasure_all(array):
    return fmeasure(array,
                    lambda relevant, _: calc_f2(relevant),
                    lambda x: x)

現在, fmeasure_kfold1fmeasure_kfold2看起來非常相似。 它們在調用fmeasure方式上大多不同,因此我們可以實現一個通用的fmeasure_kfoldn函數,該函數集中化迭代並收集結果:

def fmeasure_kfoldn(array, nfolds, callable):
    ret = []
    for train_index, test_index in KFold(len(array), nfolds):
        correlation = analyze(array[train_index])
        ret += callable(array[test_index], correlation)
    return ret

這允許非常容易地定義fmeasure_kfold1fmeasure_kfold2

def fmeasure_kfold1(array, nfolds):
    def measure(builds, correlation):
        return fmeasure(builds, calc_f, lambda build: get_tests(set(build['modules']), correlation))
    return fmeasure_kfoldn(array, nfolds, measure)


def fmeasure_kfold2(array, nfolds):
    def measure(builds, correlation):
        retrieved_tests = _sum_tests(correlation)
        return fmeasure(builds, calc_f, lambda _: retrieved_tests)
    return fmeasure_kfoldn(array, nfolds, measure)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM