[英]Deduplicating code in slightly different functions
我有兩個非常相似的循環,這兩個循環包含一個內部循環,該循環與第三個循環非常相似(eh ... :))。 用代碼說明,它看起來很像這樣:
# First function
def fmeasure_kfold1(array, nfolds):
ret = []
# Kfold1 and kfold2 both have this outer loop
for train_index, test_index in KFold(len(array), nfolds):
correlation = analyze(array[train_index])
for build in array[test_index]: # <- All functions have this loop
# Retrieved tests is calculated inside the build loop in kfold1
retrieved_tests = get_tests(set(build['modules']), correlation)
relevant_tests = set(build['tests'])
fval = calc_f(relevant_tests, retrieved_tests)
if fval is not None:
ret.append(fval)
return ret
# Second function
def fmeasure_kfold2(array, nfolds):
ret = []
# Kfold1 and kfold2 both have this outer loop
for train_index, test_index in KFold(len(array), nfolds):
correlation = analyze(array[train_index])
# Retrieved tests is calculated outside the build loop in kfold2
retrieved_tests = _sum_tests(correlation)
for build in array[test_index]: # <- All functions have this loop
relevant_tests = set(build['tests'])
fval = calc_f(relevant_tests, retrieved_tests)
if fval is not None:
ret.append(fval)
return ret
# Third function
def fmeasure_all(array):
ret = []
for build in array: # <- All functions have this loop
relevant = set(build['tests'])
fval = calc_f2(relevant) # <- Instead of calc_f, I call calc_f2
if fval is not None:
ret.append(fval)
return ret
前兩個函數僅在方式上有所不同,並且在什么時候計算retrieved_tests
。 第三個功能由前兩個函數的內部循環中,它調用不同calc_f2
,並且不使用retrieved_tests
。
實際上,代碼更加復雜,但是盡管重復使我感到煩惱,但我認為我可以接受它。 但是,最近我一直在對其進行更改,不得不一次在兩個或三個位置進行更改很煩人。
有沒有很好的方法來合並重復的代碼? 我想到的唯一方法就是引入類,它引入了許多樣板,並且我希望盡可能將這些函數保留為純函數。
編輯
這是calc_f
和calc_f2
的內容:
def calc_f(relevant, retrieved):
"""Calculate the F-measure given relevant and retrieved tests."""
recall = len(relevant & retrieved)/len(relevant)
prec = len(relevant & retrieved)/len(retrieved)
fmeasure = f_measure(recall, prec)
return (fmeasure, recall, prec)
def calc_f2(relevant, nbr_tests=1000):
"""Calculate the F-measure given relevant tests."""
recall = 1
prec = len(relevant) / nbr_tests
fmeasure = f_measure(recall, prec)
return (fmeasure, recall, prec)
f_measure
計算精度和f_measure
的諧波平均值 。
基本上,由於不需要檢索到的測試,因此calc_f2
具有許多快捷方式。
具有一個帶有一個額外參數的通用函數,該參數可以控制在何處計算retrieved_tests
。
例如
def fmeasure_kfold_generic(array, nfolds, mode):
ret = []
# Kfold1 and kfold2 both have this outer loop
for train_index, test_index in KFold(len(array), nfolds):
correlation = analyze(array[train_index])
# Retrieved tests is calculated outside the build loop in kfold2
if mode==2:
retrieved_tests = _sum_tests(correlation)
for build in array[test_index]: # <- All functions have this loop
# Retrieved tests is calculated inside the build loop in kfold1
if mode==1:
retrieved_tests = get_tests(set(build['modules']), correlation)
relevant_tests = set(build['tests'])
fval = calc_f(relevant_tests, retrieved_tests)
if fval is not None:
ret.append(fval)
一種方法是將每個內部循環都編寫為一個函數,然后將外部循環作為單獨的函數接收其他的作為參數。 這與排序函數(接收用於比較兩個元素的函數)的功能很接近。
當然,困難的部分是找到所有功能之間的共同部分到底是什么,這並不總是那么簡單。
典型的解決方案是識別算法的各個部分,並使用Template方法設計模式 ,其中在子類中實現不同的階段。 我根本不理解您的代碼,但是我假設會有諸如makeGlobalRetrievedTests()
和makeIndividualRetrievedTests()
?
我會由內而外地解決問題:通過分解最內層的循環。 這與“函數式”樣式(以及“函數式編程”)一起很好地工作。 在我看來,如果您概括fmeasure_all
,就可以實現這三個功能。 就像是
def fmeasure(builds, calcFn, retrieveFn):
ret = []
for build in array:
relevant = set(build['tests'])
fval = calcFn(relevant, retrieveFn(build))
if fval is not None:
ret.append(fval)
return ret
這使您可以定義:
def fmeasure_kfold1(array, nfolds):
ret = []
# Kfold1 and kfold2 both have this outer loop
for train_index, test_index in KFold(len(array), nfolds):
correlation = analyze(array[train_index])
ret += fmeasure(array[test_index], calc_f,
lambda build: get_tests(set(build['modules']), correlation))
return ret
def fmeasure_kfold2(array, nfolds):
ret = []
# Kfold1 and kfold2 both have this outer loop
for train_index, test_index in KFold(len(array), nfolds):
correlation = analyze(array[train_index])
# Retrieved tests is calculated outside the build loop in kfold2
retrieved_tests = _sum_tests(correlation)
ret += fmeasure(array[test_index], calc_f, lambda _: retrieved_tests)
return ret
def fmeasure_all(array):
return fmeasure(array,
lambda relevant, _: calc_f2(relevant),
lambda x: x)
現在, fmeasure_kfold1
和fmeasure_kfold2
看起來非常相似。 它們在調用fmeasure
方式上大多不同,因此我們可以實現一個通用的fmeasure_kfoldn
函數,該函數集中化迭代並收集結果:
def fmeasure_kfoldn(array, nfolds, callable):
ret = []
for train_index, test_index in KFold(len(array), nfolds):
correlation = analyze(array[train_index])
ret += callable(array[test_index], correlation)
return ret
這允許非常容易地定義fmeasure_kfold1
和fmeasure_kfold2
:
def fmeasure_kfold1(array, nfolds):
def measure(builds, correlation):
return fmeasure(builds, calc_f, lambda build: get_tests(set(build['modules']), correlation))
return fmeasure_kfoldn(array, nfolds, measure)
def fmeasure_kfold2(array, nfolds):
def measure(builds, correlation):
retrieved_tests = _sum_tests(correlation)
return fmeasure(builds, calc_f, lambda _: retrieved_tests)
return fmeasure_kfoldn(array, nfolds, measure)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.