简体   繁体   中英

How to filter Strings from a list in Haskell

I am trying to create a program that reads a text file and splits the text into a list and then creates a tuple containing each would with how many times it occurs in the text. I then need to be able to remove certain words from the list and print the final list.

I have tried different ways to try and filter Strings from a list of Strings in Haskell with no success. I have found that the filter function is the best for what I want to do, but am not sure how to implement it.

The code that I have so far is that splits up text read from a file into a list of Strings:

toWords :: String -> [String]
toWords s = words s

I then added this to remove specific Strings from the list:

toWords :: String -> [String]
toWords s = words s
toWords s = filter (`elem` "an")
toWords s = filter (`elem` "the")
toWords s = filter (`elem` "for")

Which I know is wrong, but am unsure as to how to do it. Please can anyone help me with this.

Here is my full code so far:

main = do  
       contents <- readFile "testFile.txt"
       let lowContents = map toLower contents
       let outStr = countWords (lowContents)
       let finalStr = sortOccurrences (outStr)
       print outStr

-- Counts all the words.
countWords :: String -> [(String, Int)]
countWords fileContents = countOccurrences (toWords fileContents)

-- Splits words.
toWords :: String -> [String]
toWords s = words s
toWords s = filter (`elem` "an")
toWords s = filter (`elem` "the")
toWords s = filter (`elem` "for")

-- Counts, how often each string in the given list appears.
countOccurrences :: [String] -> [(String, Int)]
countOccurrences xs = map (\xs -> (head xs, length xs)) . group . sort $ xs

-- Sort list in order of occurrences.
sortOccurrences :: [(String, Int)] -> [(String, Int)]
sortOccurrences sort = sortBy comparing snd

This will keep each word but for the forbidden ones:

toWords s = filter (\w -> w `notElem` ["an","the","for"]) (words s)

Equivalent variants:

-- explicit not
toWords s = filter (\w -> not (w `elem` ["an","the","for"])) (words s)
-- using and (&&) instead of elem
toWords s = filter (\w -> w/="an" && w/="the" && w/="for") (words s)
-- using where to define a custom predicate
toWords s = filter predicate (words s)
     where predicate w = w/="an" && w/="the" && w/="for") 
-- pointfree
toWords = filter (flip notElem ["an","the","for"]) . words

Filter is what is known in Haskell as a higher-order function. You should read about it, that kind of functions can be very useful.

Maybe what you are looking for is something like this:

toWords s = filter (condition) s 

That "condition" is a function too, that function must contain the filter you want to apply.

A little example would be if you have a lists of numbers and you wanted to take just the numbers >10, it would end up being something like this:

filterNUmbers n = filter (>10) n

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM