简体   繁体   English

显示在haskell中重复的单词列表

[英]Show a list of words repeated in haskell

I need to be able to write a function that shows repeated words from a string and return a list of strings in order of its occurrence and ignore non-letters 我需要能够编写一个函数来显示字符串中重复的单词并按顺序返回字符串列表并忽略非字母

eg at hugs prompt 例如,在拥抱提示

repetitions :: String -> [String]

repetitions > "My bag is is action packed packed."
output> ["is","packed"]
repetitions > "My name  name name is Sean ."
output> ["name","name"]
repetitions > "Ade is into into technical drawing drawing ."
output> ["into","drawing"]

To split a string into words, use the words function (in the Prelude). 要将字符串拆分为单词,请使用words function(在Prelude中)。 To eliminate non-word characters, filter with Data.Char.isAlphaNum . 要消除非单词字符,请使用Data.Char.isAlphaNum filter Zip the list together with its tail to get adjacent pairs (x, y) . 将列表与其尾部一起压缩以获得相邻的对(x, y) Fold the list, consing a new list that contains all x where x == y . 折叠列表,建立一个包含x == y所有x的新列表。

Someting like: 喜欢:

repetitions s = map fst . filter (uncurry (==)) . zip l $ tail l
  where l = map (filter isAlphaNum) (words s)

I'm not sure that works, but it should give you a rough idea. 我不确定它是否有效,但它应该给你一个粗略的想法。

I am new to this language so my solution could be a kind of ugly in the eyes of an Haskell veteran, but anyway: 我是这种语言的新手,所以我的解决方案在Haskell退伍军人眼中可能是一种丑陋,但无论如何:

let repetitions x = concat (map tail (filter (\x -> (length x) > 1) (List.group (words (filter (\c -> (c >= 'a' && c <= 'z') || (c>='A' && c <= 'Z') ||  c==' ') x)))))

This part will remove all non letters and non spaces from a string s : 这部分将删除字符串s中的所有非字母和非空格:

filter (\c -> (c >= 'a' && c <= 'z') || (c>='A' && c <= 'Z') ||  c==' ') s

This one will split a string s to words and group the same words to lists returning list of lists: 这个将字符串s拆分为单词并将相同的单词组合成列表返回列表列表:

List.group (words s)

When this part will remove all lists with less than two elements: 当此部分将删除少于两个元素的所有列表:

filter (\x -> (length x) > 1) s

After what we will concatenate all lists to one removing one element from them though 之后,我们将所有列表连接到一个从中移除一个元素的列表

concat (map tail s)

This might be inelegent, however it is conceptually very simple. 这可能是不合理的,但它在概念上非常简单。 I'm assuming that its looking for consecutive duplicate words like the examples. 我假设它正在寻找像示例一样的连续重复单词。

-- a wrapper that allows you to give the input as a String
repititions :: String -> [String]
repititions s = repititionsLogic (words s)
-- dose the real work 
repititionsLogic :: [String] -> [String]
repititionsLogic [] = []
repititionsLogic [a] = []
repititionsLogic (a:as) 
    | ((==) a (head as)) = a : repititionsLogic as
    | otherwise = repititionsLogic as

Building on what Alexander Prokofyev answered: 以Alexander Prokofyev回答的为基础:

repetitions x = concat (map tail (filter (\\x -> (length x) > 1) (List.group (word (filter (\\c -> (c >= 'a' && c <= 'z') || (c>='A' && c <= 'Z') || c==' ') x)))))

Remove unnecessary parenthesis: 删除不必要的括号:

repetitions x = concat (map tail (filter (\\x -> length x > 1) (List.group (word (filter (\\c -> c >= 'a' && c <= 'z' || c>='A' && c <= 'Z' || c==' ') x)))))

Use $ to remove more parenthesis (each $ can replace an opening parenthesis if the ending parenthesis is at the end of the expression): 使用$删除更多括号(如果结束括号位于表达式的末尾,则每个$可以替换左括号):

repetitions x = concat $ map tail $ filter (\\x -> length x > 1) $ List.group $ word $ filter (\\c -> c >= 'a' && c <= 'z' || c>='A' && c <= 'Z' || c==' ') x

Replace character ranges with functions from Data.Char, merge concat and map: 用Data.Char中的函数替换字符范围,合并concat和map:

repetitions x = concatMap tail $ filter (\\x -> length x > 1) $ List.group $ word $ filter (\\c -> isAlpha c || isSeparator c) x

Use a section and currying in points-free style to simplify (\\x -> length x > 1) to ((>1) . length) . 使用一个部分并以无点样式进行曲线处理以简化(\\x -> length x > 1) to ((>1) . length) This combines length with (>1) (a partially applied operator, or section ) in a right-to-left pipeline. 这将length与(> 1)(部分应用的运算符或部分 )组合在一个从右到左的管道中。

repetitions x = concatMap tail $ filter ((>1) . length) $ List.group $ word $ filter (\\c -> isAlpha c || isSeparator c) x

Eliminate explicit "x" variable to make overall expression points-free: 消除显式“x”变量以使整个表达式无点:

repetitions = concatMap tail . filter ((>1) . length) . List.group . word . filter (\\c -> isAlpha c || isSeparator c)

Now the entire function, reading from right to left, is a pipeline that filters only alpha or separator characters, splits it into words, breaks it into groups, filters those groups with more than 1 element, and then reduces the remaining groups to the first element of each. 现在整个函数,从右到左阅读,是一个管道,只过滤字母或分隔符字符,将其拆分为单词,将其分成组,过滤那些具有多于1个元素的组,然后将剩余的组减少到第一个每个元素。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM