[英]Error parsing a char (――) in Haskell
I'm writing a parser to parse huge chunks of English text using attoparsec. 我正在编写一个解析器,以使用attoparsec解析大量英语文本。 Everything has been great so far, except for parsing this char "――"
. 到目前为止,除了解析此char "――"
之外,一切都很好。 I know it is just 2 dashes together "--"
. 我知道这只是两个破折号"--"
。 The weird thing is, the parser catches it in this code: 奇怪的是,解析器在以下代码中捕获了它:
wordSeparator :: Parser ()
wordSeparator = many1 (space <|> satisfy (inClass "――?!,:")) >> pure ()
but not in this case: 但在这种情况下不是:
specialChars = ['――', '?', '!', ',', ':']
wordSeparator :: Parser ()
wordSeparator = many1 (space <|> satisfy (inClass specialChars)) >> pure ()
The reason I'm using the list specialChars
is because I have a lot of characters to consider and I apply it multiple cases. 我使用specialChars
列表的specialChars
是因为我要考虑很多字符,因此我将其应用于多种情况。 And for the input consider: "I am ――Walt Whitman._"
and the output is supposed to be {"I", "am", "Walt", "Whiteman."}
I believe it's mostly because "――"
is not a Char? 对于输入,请考虑: "I am ――Walt Whitman._"
,而输出应该是{"I", "am", "Walt", "Whiteman."}
我认为这主要是因为"――"
是不是字符? How do I fix this? 我该如何解决?
A Char
is one character, full stop. Char
是一个字符,句号。 ――
is two characters, so it is two Char
s. ――
是两个字符,所以是两个Char
。 You can fit as many Char
s as you want into a String
, but you certainly cannot fit two Char
s into one Char
. 您可以将任意多个Char
放入一个String
,但您肯定不能将两个Char
放入一个Char
。
Since satisfy
considers individual characters at a time, it probably isn't what you want if you need to parse a sequence of two characters as a single unit. 因为satisfy
考虑单个字符,所以如果您需要将两个字符的序列解析为一个单元,则可能不是您想要的。 The inClass
function just produces a predicate on characters ( inClass
partially applied to one argument produces a function of type Char -> Bool
), so inClass "――"
is the same as inClass ['―', '―']
, which is just the same as inClass ['―']
since duplicates are irrelevant. inClass
函数只是针对字符生成谓词(部分应用于一个参数的inClass
会生成inClass "――"
Char -> Bool
类型的函数),因此inClass "――"
与inClass ['―', '―']
,即与inClass ['―']
相同,因为重复项无关紧要。 That won't help you much. 那对你没有多大帮助。
Consider using string
instead of or in combination with inClass
, since it is designed to handle sequences of characters. 考虑使用string
代替inClass
或与inClass
结合使用,因为它旨在处理字符序列 。 For example, something like this might better suit your needs: 例如,类似这样的东西可能更适合您的需求:
wordSeparator :: Parser ()
wordSeparator = many1 (space <|> string "――" <|> satisfy (inClass "?!,:")) >> pure ()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.