简体   繁体   English

Haskell提取字符串中的子字符串

[英]Haskell extract substring within a string

My goal is to find the number of times a substring exists within a string. 我的目标是找到字符串中存在子字符串的次数。 The substring I'm looking for will be of type "[n]", where n can be any variable. 我正在寻找的子串将是“[n]”类型,其中n可以是任何变量。

My attempt involved splitting the string up using the words function, then create a new list of strings if the 'head' of a string was '[' and the 'last' of the same string was ']' 我的尝试涉及使用单词function分割字符串,然后如果字符串的'head'为'['且同一字符串的'last'为']',则创建一个新的字符串列表

The problem I ran into was that I entered a String which when split using the function words, created a String that looked like this "[2]," Now, I still want this to count as an occurrence of the type "[n]" 我遇到的问题是我输入了一个字符串,当使用功能字分割时,创建了一个看起来像这个“[2]”的字符串,现在,我仍然希望这可以算作类型“[n]的出现“

An example would be I would want this String, 一个例子是我想要这个String,

asdf[1]jkl[2]asdf[1]jkl ASDF [1] JKL [2] ASDF [1] JKL

to return 3. 返回3。

Here's the code I have: 这是我的代码:

-- String that will be tested on references function
txt :: String
txt = "[1] and [2] both feature characters who will do whatever it takes to " ++
  "get to their goal, and in the end the thing they want the most ends " ++
  "up destroying them.  In case of [2], this is a whale..."

-- Function that will take a list of Strings and return a list that contains
-- any String of the type [n], where n is an variable
ref :: [String] -> [String]
ref [] = []
ref xs = [x | x <- xs, head x == '[', last x == ']']

-- Function takes a text with references in the format [n] and returns
-- the total number of references.
-- Example :  ghci> references txt -- -> 3
references :: String -> Integer   
references txt = len (ref (words txt))

If anyone can enlighten me on how to search for a substring within a string or how to parse a string given a substring, that would be greatly appreciated. 如果有人可以告诉我如何在字符串中搜索子字符串或如何解析给定子字符串的字符串,那将非常感激。

I would just use a regular expression, and write it like this: 我只想使用正则表达式,并像这样写:

import Text.Regex.Posix

txt :: String
txt = "[1] and [2] both feature characters who will do whatever it takes to " ++
  "get to their goal, and in the end the thing they want the most ends " ++
  "up destroying them.  In case of [2], this is a whale..."


-- references counts the number of references in the input string
references :: String -> Int
references str = str =~ "\\[[0-9]*\\]"

main = putStrLn $ show $ references txt -- outputs 3

regex is huge overkill for such a simple problem. 对于这样一个简单的问题,正则表达式是非常难以理解的。

references = length . consume

consume []       = []
consume ('[':xs) = let (v,rest) = consume' xs in v:consume rest
consume (_  :xs) = consume xs

consume' []       = ([], []) 
consume' (']':xs) = ([], xs)
consume' (x  :xs) = let (v,rest) = consume' xs in (x:v, rest)

consume waits for a [ , then calls consume' , which gathers everything until a ] . consume等待[ ,然后调用consume' ,收集所有东西直到a ]

Here's a solution with sepCap . 这是sepCap的解决方案。

import Replace.Megaparsec
import Text.Megaparsec
import Text.Megaparsec.Char
import Data.Either
import Data.Maybe

txt = "[1] and [2] both feature characters who will do whatever it takes to " ++
  "get to their goal, and in the end the thing they want the most ends " ++
  "up destroying them.  In case of [2], this is a whale..."

pattern = single '[' *> anySingle <* single ']' :: Parsec Void String Char
length $ rights $ fromJust $ parseMaybe (sepCap pattern) txt
3

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM