[英]Parsing .csv into clauses in Haskell
I am trying to convert a .csv of 我正在尝试将.csv转换为
femin,femin,1,f,woman,women,
aqu,aqu,1,f,water,waters,
..
into a file of .pl like 到.pl文件中
noun(femin,femin,1,f,trans(woman,women)).
noun(aqu,aqu,1,f,trans(water,waters)).
..
here is my current source code: 这是我当前的源代码:
-- get from "femin, femin, 1, f, woman, women" to noun(femin, femin, 1, f ,trans(woman,women)).
import qualified Data.Attoparsec as P
data NounClause = NounClause
{
nom :: String,
gen :: String,
declension :: String,
gender :: String,
sgtrans :: String,
pltrans :: String
} deriving Show
parseNounClause :: P.Parser NounClause
parseNounClause = do
nom <- String
char ","
gen <- String
char ","
declension <- String
char ","
gender <- String
char ","
sgtrans <- String
char ","
pltrans <- String
return $ NounClause nom gen declension gender sgtrans pltrans
However, this does not seem to be working. 但是,这似乎不起作用。 Why is this so?
为什么会这样呢?
Also, how can I apply this parser to each line? 另外,如何将这个解析器应用于每一行? Here also is my function that takes the parsed data and returns a string.
这也是我的函数,它接受解析的数据并返回一个字符串。
c = ","
convert :: NounClause -> String
convert NounClause = "noun(" ++ nom ++ c ++ gen ++ c ++ declension ++ c ++ gender ++ "trans(" ++ sgtrans ++ c ++ pltrans ++ "))."
I very much thank anyone who helps me on this project; 我非常感谢在这个项目上为我提供帮助的任何人; their contribution is most valuable to me.
他们的贡献对我来说最有价值。
If you use the String parser, it tries to consume as much input as possible. 如果使用String解析器,它将尝试消耗尽可能多的输入。 This includes the commas in your file.
这包括文件中的逗号。 So you construct a parser that reads everything except for commas.
因此,您可以构造一个解析器,以读取除逗号以外的所有内容。
import qualified Data.Attoparsec.Text as P
import Data.Text(unpack)
entry = fmap unpack (P.takeWhile (/=','))
unpack is used to convert the parsed info of type Text
into a String. unpack用于将
Text
类型的已解析信息转换为字符串。
Then you need an additional parser that reads a comma. 然后,您需要一个附加的读取逗号的解析器。
separator = P.char ','
Then we combine this to parse a NounClause
然后,我们将其组合为一个名
NounClause
parseNounClause :: P.Parser NounClause
parseNounClause = do
nom <- entry
separator -- don't need the comma so no need to keep it.
gen <- entry
separator
declension <- entry
separator
gender <- entry
separator
sgtrans <- entry
separator
pltrans <- entry
separator
return $ NounClause nom gen declension gender sgtrans pltrans
So now you want to read multiple lines. 所以现在您想阅读多行。 This is the same as the comma but now with a newline symbol.
这与逗号相同,但现在带有换行符。 We parse an entry and a newline symbol alternating.
我们交替解析一个条目和一个换行符。 To compensate for n lines but only n-1 newline symbols we make a distinction for the first line.
为了补偿n行但仅补偿n-1个换行符号,我们对第一行进行了区分。 The
many'
combinator allows us to parse zero or more lines of the same format. “
many'
组合器使我们可以解析零个或多个相同格式的行。
multiLines = do x <- parseNounClause
xs <- many' (do P.endOfLine
clause <- parseNounClause
return clause
)
return (x:xs)
So now wee need to parse this. 因此,现在我们需要对此进行解析。 This is done by the following main Function.
这是通过以下主要功能完成的。 For which we need another import, to read the Fileinfo as of type
Text
为此,我们需要再次导入,以读取
Text
类型的Fileinfo
import qualified Data.Text.IO as T(readFile)
main :: IO ()
main = do fileContents <- T.readFile "input.txt"
let result = P.parseOnly multiLines fileContents
case result of (Left s) -> putStrLn s
(Right rs) -> sequence_ (map (putStrLn . show) rs)
The parse result will get us either an error message or all NounClause
s in a list. 解析结果将为我们提供错误消息或列表中的所有
NounClause
。 I use sequence_ (map (putStrLn .show)
to print it. 我使用
sequence_ (map (putStrLn .show)
进行打印。
You have the function show, which can convert your data type into a String
because you added the deriving Show
at the end of the definition if you want to use your own String
representation instantiate the type class yourself (instead of your convert
function) like: 您具有函数show,它可以将数据类型转换为
String
因为如果您想使用自己的String
表示形式自己实例化类型类(而不是convert
函数),则在定义的末尾添加了deriving Show
,如下所示:
instance Show NounClause where
show n = ...
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.