將命令式算法轉換為功能樣式

Question

我編寫了一個簡單的過程來計算Java項目中某些特定軟件包的測試覆蓋率的平均值。 巨大的html文件中的原始數據如下所示：

<body>  
package pkg1 <line_coverage>11/111,<branch_coverage>44/444<end>  
package pkg2 <line_coverage>22/222,<branch_coverage>55/555<end>  
package pkg3 <line_coverage>33/333,<branch_coverage>66/666<end>  
...   
</body>

例如，給定指定的軟件包“ pkg1”和“ pkg3”，平均行覆蓋范圍是：

（11 + 33）/（111 + 333）

平均分支覆蓋率為：

（44 + 66）/（444 + 666）

我編寫了以下步驟來獲得結果，並且效果很好。 但是如何以功能樣式實現此計算呢？ 類似於“（如果...，對於...，對於...，對於...，對於...，b）。 我對Erlang，Haskell和Clojure有所了解，因此也歡迎使用這些語言的解決方案。 非常感謝！

from __future__ import division
import re
datafile = ('abc', 'd>11/23d>34/89d', 'e>25/65e>13/25e', 'f>36/92f>19/76')
core_pkgs = ('d', 'f')
covered_lines, total_lines, covered_branches, total_branches = 0, 0, 0, 0
for line in datafile:
    for pkg in core_pkgs:
        ptn = re.compile('.*'+pkg+'.*'+'>(\d+)/(\d+).*>(\d+)/(\d+).*')
        match = ptn.match(line)
        if match is not None:
            cvln, tlln, cvbh, tlbh = match.groups()
            covered_lines += int(cvln)
            total_lines += int(tlln)
            covered_branches += int(cvbh)
            total_branches += int(tlbh)
print 'Line coverage:', '{:.2%}'.format(covered_lines / total_lines)
print 'Branch coverage:', '{:.2%}'.format(covered_branches/total_branches)

Answer 1

在下面，您可以找到我的Haskell解決方案。 我將嘗試解釋我在撰寫本文時所經歷的重點。

首先，您會發現我為coverage數據創建了一個數據結構。 創建數據結構以表示要處理的任何數據通常是一個好主意。 這部分是因為它使您可以輕松地設計代碼，從而可以根據自己的設計進行思考–與函數式編程理念密切相關；部分原因是，它可以消除一些您認為自己在做某事的錯誤，但是實際上正在做其他事情。
與之前的觀點有關：我要做的第一件事是將字符串表示的數據轉換為我自己的數據結構。 在進行函數式編程時，通常會以“掃描”方式進行操作。 您沒有一個將數據轉換為格式，過濾掉不需要的數據並匯總結果的函數。 對於這些任務，您具有三種不同的功能，並且一次只能完成一項！
這是因為功能非常容易組合 ，也就是說，如果您有三個不同的功能，則可以將它們粘在一起以形成一個單一的功能。 如果您從一個開始，很難將其分解成三個不同的部分。
除非您專門進行Haskell，否則轉換函數的實際工作實際上並不有趣。 它所做的只是嘗試將每個字符串與一個正則表達式匹配，如果成功，它將覆蓋率數據添加到結果列表中。
再一次，瘋狂的創作即將發生。 我沒有創建用於遍歷coverage列表並將其匯總的函數。 我創建了一個函數來匯總兩個 coverage，因為我知道我可以將其與專門的fold循環（類似於類固醇的for循環）一起使用，以匯總列表中的所有coverage。 我不需要自己重新發明輪子並自己創建一個循環。
此外，我的sumCoverages函數可用於許多專門的循環，因此我不必編寫大量函數，只需將單個函數粘貼到大量預制庫函數中！
在main功能中，您將看到我對數據進行“掃描”或“傳遞”編程的含義。 首先，我將其轉換為內部格式，然后過濾掉不需要的數據，然后總結剩余的數據。 這些是完全獨立的計算。 那是函數式編程。
您還將注意到，我在那里使用了兩個專用循環， filter和fold 。 這意味着我不必自己編寫任何循環，我只需要將函數粘貼到這些標准庫循環中，然后從那里接管即可。

import Data.Maybe (catMaybes)
import Data.List (foldl')
import Text.Printf (printf)
import Text.Regex (matchRegex, mkRegex)

corePkgs = ["d", "f"]

stats = [
  "d>11/23d>34/89d",
  "e>25/65e>13/25e",
  "f>36/92f>19/76"
  ]

format = mkRegex ".*(\\w+).*>([0-9]+)/([0-9]+).*>([0-9]+)/([0-9]+).*"


-- It might be a good idea to define a datatype for coverage data.
-- A bit of coverage data is defined as the name of the package it
-- came from, the lines covered, the total amount of lines, the
-- branches covered and the total amount of branches.
data Coverage = Coverage String Int Int Int Int


-- Then we need a way to convert the string data into a list of
-- coverage data. We do this by regex. We try to match on each
-- string in the list, and then we choose to keep only the successful
-- matches. Returned is a list of coverage data that was represented
-- by the strings.
convert :: [String] -> [Coverage]
convert = catMaybes . map match
  where match line = do
          [name, cl, tl, cb, tb] <- matchRegex format line
          return $ Coverage name (read cl) (read tl) (read cb) (read tb)


-- We need a way to summarise two coverage data bits. This can of course also
-- be used to summarise entire lists of coverage data, by folding over it.
sumCoverage (Coverage nameA clA tlA cbA tbA) (Coverage nameB clB tlB cbB tbB) =
  Coverage (nameA ++ nameB ++ ",") (clA + clB) (tlA + tlB) (cbA + cbB) (tbA + tbB)


main = do
      -- First we need to convert the strings to coverage data
  let coverageData = convert stats
      -- Then we want to filter out only the relevant data
      relevantData = filter (\(Coverage name _ _ _ _) -> name `elem` corePkgs) coverageData
      -- Then we need to summarise it, but we are only interested in the numbers
      Coverage _ cl tl cb tb = foldl' sumCoverage (Coverage "" 0 0 0 0) relevantData

  -- So we can finally print them!
  printf "Line coverage: %.2f\n" (fromIntegral cl / fromIntegral tl :: Double)
  printf "Branch coverage: %.2f\n" (fromIntegral cb / fromIntegral tb :: Double)

Answer 2

以下是適用於您的代碼的一些速成，未經測試的想法：

import numpy as np
import re

datafile = ('abc', 'd>11/23d>34/89d', 'e>25/65e>13/25e', 'f>36/92f>19/76')
core_pkgs = ('d', 'f')
covered_lines, total_lines, covered_branches, total_branches = 0, 0, 0, 0

for pkg in core_pkgs:
    ptn = re.compile('.*'+pkg+'.*'+'>(\d+)/(\d+).*>(\d+)/(\d+).*')
    matches = map(datafile, ptn.match)
    statsList = [map(int, match.groups()) for match in matches if matches]
    # statsList is a list of [cvln, tlln, cvbh, tlbh]
    stats = np.array(statsList)
    covered_lines, total_lines, covered_branches, total_branches = stats.sum(axis=1)

好了，正如您所看到的，我沒有費心去完成剩余的循環，但是我認為到此為止。 當然，實現這一目標的方法不止一種。 我選擇炫耀map() （有些人會說這會使效率降低，而且可能確實如此），以及NumPy來完成（公認的輕量級）數學。

Answer 3

這是相應的Clojure解決方案：

(defn extract-data
  "extract 4 integer from a string line according to a package name"
  [pkg line]
  (map read-string
       (rest (first
              (re-seq
               (re-pattern
                (str pkg ".*>(\\d+)/(\\d+).*>(\\d+)/(\\d+)"))
               line)))))

(defn scan-lines-by-pkg
  "scan all string lines and extract all data as integer sequences
    according to package names"
  [pkgs lines]
  (filter seq (for [pkg pkgs
                    line lines]
                (extract-data pkg line))))

(defn sum-data
  "add all data in valid lines together"
  [pkgs lines]
  (apply map + (scan-lines-by-pkg pkgs lines)))

(defn get-percent
  [covered all]
  (str (format "%.2f" (float (/ (* covered 100) all))) "%"))

(defn get-cov
  [pkgs lines]
  {:line-cov (apply get-percent (take 2 (sum-data pkgs lines)))
    :branch-cov (apply get-percent (drop 2 (sum-data pkgs lines)))})

(get-cov ["d" "f"] ["abc" "d>11/23d>34/89d" "e>25/65e>13/25e" "f>36/92f>19/76"])

將命令式算法轉換為功能樣式

問題描述

3 個解決方案

解決方案1
3 2013-09-29 11:42:06

解決方案2
1 2013-09-29 10:16:39

解決方案3
0 已采納 2013-10-14 13:03:48

將命令式算法轉換為功能樣式

問題描述

3 個解決方案

解決方案1 3 2013-09-29 11:42:06

解決方案2 1 2013-09-29 10:16:39

解決方案3 0 已采納 2013-10-14 13:03:48

解決方案1
3 2013-09-29 11:42:06

解決方案2
1 2013-09-29 10:16:39

解決方案3
0 已采納 2013-10-14 13:03:48