简体   繁体   中英

How can I remove specific words from a string - Ruby

I have the following string, from which I want to extract any 'words' which do not contain numbers or special characters. For now, commas, question marks or full stops are accepted:

b? Dl )B 4(V! A. MK, YtG ](f 1m )CNxuNUR {PG?

Desired output:

b? Dl A. MK, YtG

5

Current output:

b? Dl A. MK, YtG 1m

6

At the moment, the function below successfully removes numbers from the string, however, words which include both numbers and letters are not omitted. Thus, the '1m' being included in my current output.

Current function:

def howMany(sentence)

    if sentence.is_a? String
        
        output = sentence.split
        count = 0

        test_output = []

        output.each {|word| 

            if word !~ /\D/ || word =~ /[!@#$%^&*()_+{}\[\]:;'"\/\\><]/
                count
            else
                test_output.push(word)
                count += 1
            end

        }   

        puts test_output 
        puts count 
    
    else
        puts "Please enter a valid string" 
    end

end 

My assumption is I'll have to somehow iterate through each word in the string in order to find whether it includes numbers, however, I'm not sure how to go about that specific solution. I thought about using .split("") inside my output.each function but was unsuccessful after a few attempts.

Any suggestions would be hugely appreciated.

Thanks in advance!

This is a job for String#scan using a regular expression.

str = "b? Dl )B 4(V! A. MK, YtG ](f 1m )CNxuNUR {PG?"
str.scan(/(?<!\S)[a-z.,\?\r\n]+(?!\S)/i)
  #=> ["b?", "Dl", "A.", "MK,", "YtG"]

Ruby demo < ¯\\ (ツ)> PCRE demo

I've included the link to regex101.com , a popular site for testing regular expressions, because it provides extensive information, in particular, by hovering over each element of the expression one can obtain an explanation of its function. (By hovering the cursor, that is.) As that site does not support Ruby's regex engine ( Onigmo for v2.0+), I've selected the PCRE regex engine, which in this case gives the same result as does Ruby's engine.


The regular expression can be written in free-spacing mode to make it self-documenting.

/
(?<!\S)         # negative lookbehind asserts that the following
                # match is not preceded by a character other than
                # a whitespace
[a-z.,\?\r\n]+  # match one or more of the indicated characters
(?!\S)          # negative lookahead asserts that the previous
                # match is not followed by a character other than
                # a whitespace
/ix             # case-insensitive (i) and free-spacing regex
                # definition modes

Alternatively, to avoid the need for the negative lookbehind (?<!\\S) and the negative lookahead (?!\\S) , one could split and then select:

a.select { |s| s.match?(/\A[a-z.,\?\r\n]+\z/i) }
  #=> ["b?", "Dl", "A.", "MK,", "YtG"]

I would suggest trying something like this.

Turn the sentence into an array using split sentence.split(' ') . Then allow only the ones that match the pattern using filter Then use the filtered list for both puts operations. It should look something like this.

def how_many(sentence)
  sentence.split(' ').filter { |word| matches_pattern?(word) }.tap do |words|
    puts words.size
    puts words # or words.join(' ')
  end
end

def matches_pattern?(word)
  word.matches? /some_regular_expression/
end

You can of course modify accordingly to add any side cases, et c. This would be a more idiomatic solution.

Note than you can also use .filter(&method(:matches_pattern?)) but that might be confusing to some.

Edit : rubular.com is a good place to try your regexps.

Edit : when things get hard, try making them in smaller chunks (ie try not to make methods longer than 5 lines).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM