[英]Haskell cassava (Data.Csv): Carry along additional columns
我有兩個.csv 文件
A.csv:
A,B,C,D,E
1,2,3,4,5
5,4,3,2,1
B.csv
A,E,B,C,F
6,7,8,9,1
4,3,4,5,6
我想在 Haskell 中閱讀它們,並對變量A
, B
和C
進行嚴格的解析規則。 然后我想對A.csv和B.csv的行應用復雜的合並和過濾操作,並根據結果創建一個文件C.csv 。 這篇文章末尾的代碼塊基本上涵蓋了這個功能。
題:
我現在想做所有這些,同時保留變量D
、 E
和F
。 在我的真實數據集中,我有未知且任意數量的此類附加列。 我不能輕易地用各自的數據類型(下面的ABC
)表示它們。 所有這些都應該保留並在 output 數據集中正確表示。
使用下面的代碼, C.csv看起來像這樣:
A,B,C
1,2,3
5,4,3
6,8,9
4,4,5
相反,我希望得到這樣的結果:
A,B,C,D,E,F
1,2,3,4,5,_
5,4,3,2,1,_
6,8,9,_,7,1
4,4,5,_,3,6
有沒有辦法用木薯做這個? 我是否必須從頭開始編寫自定義解析器才能獲得此功能? 我怎么會go這個呢?
此示例代碼缺少所需的功能。 它是一個獨立的堆棧腳本。
#!/usr/bin/env stack
-- stack --resolver lts-18.7 script --package cassava,bytestring,vector
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE RecordWildCards #-}
import qualified Data.ByteString.Lazy as B
import qualified Data.Csv as C
import qualified Data.Vector as V
data ABC = ABC {a :: Int, b :: Int, c :: Int} deriving Show
instance C.FromNamedRecord ABC where
parseNamedRecord m =
ABC <$> m C..: "A" <*> m C..: "B" <*> m C..: "C"
instance C.ToNamedRecord ABC where
toNamedRecord ABC {..} =
C.namedRecord ["A" C..= a, "B" C..= b, "C" C..= c]
decodeABC :: B.ByteString -> [ABC]
decodeABC x =
case C.decodeByName x of
Left err -> error err
Right (_,xs) -> V.toList xs
header :: C.Header
header = V.fromList ["A", "B", "C"]
main :: IO ()
main = do
fileA <- B.readFile "A.csv"
fileB <- B.readFile "B.csv"
let decodedA = decodeABC fileA
let decodedB = decodeABC fileB
putStrLn $ show decodedA
putStrLn $ show decodedB
B.writeFile "C.csv" $ C.encodeByName header (decodedA ++ decodedB)
此代碼包含所需的功能(感謝@Daniel Wagner 的輸入):
#!/usr/bin/env stack
-- stack --resolver lts-18.7 script --package cassava,bytestring,vector,unordered-containers
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE RecordWildCards #-}
import qualified Data.ByteString.Lazy as B
import qualified Data.Csv as C
import qualified Data.HashMap.Strict as HM
import qualified Data.Vector as V
data ABC = ABC {a :: Int, b :: Int, c :: Int, addCols :: C.NamedRecord} deriving Show
abcDefinedCols = ["A", "B", "C"]
abcRefHashMap = HM.fromList $ map (\x -> (x, ())) abcDefinedCols
instance C.FromNamedRecord ABC where
parseNamedRecord m =
pure ABC
<*> m C..: "A"
<*> m C..: "B"
<*> m C..: "C"
<*> pure (m `HM.difference` abcRefHashMap)
instance C.ToNamedRecord ABC where
toNamedRecord m =
(addCols m) `HM.union` C.namedRecord ["A" C..= a m, "B" C..= b m, "C" C..= c m]
decodeABC :: B.ByteString -> [ABC]
decodeABC x =
case C.decodeByName x of
Left err -> error err
Right (_,xs) -> V.toList xs
makeCompleteHeader :: [ABC] -> C.Header
makeCompleteHeader ms = V.fromList $ abcDefinedCols ++ HM.keys (HM.unions (map addCols ms))
combineABCs :: [ABC] -> [ABC] -> [ABC]
combineABCs xs1 xs2 =
let simpleSum = xs1 ++ xs2
addColKeys = HM.keys (HM.unions (map addCols simpleSum))
toAddHashMap = HM.fromList (map (\k -> (k, "n/a")) addColKeys)
in map (\x -> x { addCols = fillAddCols (addCols x) toAddHashMap }) simpleSum
where
fillAddCols :: C.NamedRecord -> C.NamedRecord -> C.NamedRecord
fillAddCols cur toAdd = HM.union cur (toAdd `HM.difference` cur)
main :: IO ()
main = do
fileA <- B.readFile "A.csv"
fileB <- B.readFile "B.csv"
let decodedA = decodeABC fileA
let decodedB = decodeABC fileB
putStrLn $ show decodedA
putStrLn $ show decodedB
let ab = combineABCs decodedA decodedB
B.writeFile "C.csv" $ C.encodeByName (makeCompleteHeader ab) ab
data ABCPlus = ABCPlus { a :: Int, b :: Int, c :: Int, d :: NamedRecord } deriving Show
instance FromNamedRecord ABCPlus where
parseNamedRecord m = pure ABC
<*> m .: "A"
<*> m .: "B"
<*> m .: "C"
<*> pure m -- or perhaps: pure (m `HM.difference` HM.fromList [("A", ()), ("B", ()), ("C", ())])
instance ToNamedRecord ABCPlus where
toNamedRecord m = d m -- or perhaps: d m `HM.union` namedRecord ["A" .= a m, "B" .= b m, "C" .= c m]
headers :: [ABCPlus] -> Header
headers ms = header $ ["A", "B", "C"] ++ HM.keys (relevant combined) where
relevant m = m `HM.difference` HM.fromList [("A", ()), ("B", ()), ("C", ())] -- or perhaps: m
combined = HM.unions [relevantKeys (d m) | m <- ms]
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.