简体   繁体   English

将.csv解析为Haskell中的子句

[英]Parsing .csv into clauses in Haskell

I am trying to convert a .csv of 我正在尝试将.csv转换为

femin,femin,1,f,woman,women,
aqu,aqu,1,f,water,waters,
..

into a file of .pl like 到.pl文件中

noun(femin,femin,1,f,trans(woman,women)).
noun(aqu,aqu,1,f,trans(water,waters)).
..

here is my current source code: 这是我当前的源代码:

-- get from "femin, femin, 1, f, woman, women" to noun(femin, femin, 1, f ,trans(woman,women)). 
import qualified Data.Attoparsec as P


data NounClause = NounClause
    {
        nom :: String,
        gen :: String,
        declension :: String,
        gender :: String,
        sgtrans :: String,
        pltrans :: String
    } deriving Show 

parseNounClause :: P.Parser NounClause

parseNounClause = do 
    nom <- String
    char ","
    gen <- String
    char ","
    declension <- String
    char ","
    gender <- String
    char ","
    sgtrans <- String
    char ","
    pltrans <- String
    return $ NounClause nom gen declension gender sgtrans pltrans

However, this does not seem to be working. 但是,这似乎不起作用。 Why is this so? 为什么会这样呢?

Also, how can I apply this parser to each line? 另外,如何将这个解析器应用于每一行? Here also is my function that takes the parsed data and returns a string. 这也是我的函数,它接受解析的数据并返回一个字符串。

c = ","
convert :: NounClause -> String
convert NounClause = "noun(" ++ nom ++ c ++ gen ++ c ++ declension ++ c ++ gender ++ "trans(" ++ sgtrans ++ c ++ pltrans ++ "))."

I very much thank anyone who helps me on this project; 我非常感谢在这个项目上为我提供帮助的任何人; their contribution is most valuable to me. 他们的贡献对我来说最有价值。

If you use the String parser, it tries to consume as much input as possible. 如果使用String解析器,它将尝试消耗尽可能多的输入。 This includes the commas in your file. 这包括文件中的逗号。 So you construct a parser that reads everything except for commas. 因此,您可以构造一个解析器,以读取除逗号以外的所有内容。

import qualified Data.Attoparsec.Text as P
import Data.Text(unpack)

entry = fmap unpack (P.takeWhile (/=','))

unpack is used to convert the parsed info of type Text into a String. unpack用于将Text类型的已解析信息转换为字符串。

Then you need an additional parser that reads a comma. 然后,您需要一个附加的读取逗号的解析器。

separator = P.char ','

Then we combine this to parse a NounClause 然后,我们将其组合为一个名NounClause

parseNounClause :: P.Parser NounClause
parseNounClause = do 
    nom <- entry
    separator -- don't need the comma so no need to keep it.
    gen <- entry
    separator
    declension <- entry
    separator
    gender <- entry 
    separator
    sgtrans <- entry
    separator
    pltrans <- entry
    separator
    return $ NounClause nom gen declension gender sgtrans pltrans

So now you want to read multiple lines. 所以现在您想阅读多行。 This is the same as the comma but now with a newline symbol. 这与逗号相同,但现在带有换行符。 We parse an entry and a newline symbol alternating. 我们交替解析一个条目和一个换行符。 To compensate for n lines but only n-1 newline symbols we make a distinction for the first line. 为了补偿n行但仅补偿n-1个换行符号,我们对第一行进行了区分。 The many' combinator allows us to parse zero or more lines of the same format. many'组合器使我们可以解析零个或多个相同格式的行。

multiLines = do x <- parseNounClause
                xs <- many' (do P.endOfLine
                                clause <- parseNounClause
                                return clause
                           )
                return (x:xs)

So now wee need to parse this. 因此,现在我们需要对此进行解析。 This is done by the following main Function. 这是通过以下主要功能完成的。 For which we need another import, to read the Fileinfo as of type Text 为此,我们需要再次导入,以读取Text类型的Fileinfo

import qualified Data.Text.IO as T(readFile)

main :: IO ()
main = do fileContents <- T.readFile "input.txt"
          let result = P.parseOnly multiLines fileContents
          case result of (Left s)   -> putStrLn s
                         (Right rs) -> sequence_ (map (putStrLn . show) rs)

The parse result will get us either an error message or all NounClause s in a list. 解析结果将为我们提供错误消息或列表中的所有NounClause I use sequence_ (map (putStrLn .show) to print it. 我使用sequence_ (map (putStrLn .show)进行打印。

You have the function show, which can convert your data type into a String because you added the deriving Show at the end of the definition if you want to use your own String representation instantiate the type class yourself (instead of your convert function) like: 您具有函数show,它可以将数据类型转换为String因为如果您想使用自己的String表示形式自己实例化类型类(而不是convert函数),则在定义的末尾添加了deriving Show ,如下所示:

instance Show NounClause where
    show n = ...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM