简体   繁体   中英

How to understand gsub(/^.*\//, '') or the regex

Breaking up the below code to understand my regex and gsub understanding:

str = "abc/def/ghi.rb"
str = str.gsub(/^.*\//, '')
#str = ghi.rb

^ : beginning of the string

\\/ : escape character for /

^.*\\/ : everything from beginning to the last occurrence of / in the string

Is my understanding of the expression right?

How does .* work exactly?

Your general understanding is correct. The entire regex will match abc/def/ and String#gsub will replace it with empty string.

However, note that String#gsub doesn't change the string in place. This means that str will contain the original value( "abc/def/ghi.rb" ) after the substitution. To change it in place, you can use String#gsub! .


As to how .* works - the algorithm the regex engine uses is called backtracking . Since .* is greedy (will try to match as many characters as possible), you can think that something like this will happen:

Step 1 : .* matches the entire string abc/def/ghi.rb . Afterwards \\/ tries to match a forward slash, but fails (nothing is left to match). .* has to backtrack.
Step 2 : .* matches the entire string except the last character - abc/def/ghi.r . Afterwards \\/ tries to match a forward slash, but fails ( / != b ). .* has to backtrack.
Step 3 : .* matches the entire string except the last two characters - abc/def/ghi. . Afterwards \\/ tries to match a forward slash, but fails ( / != r ). .* has to backtrack.
...
Step n : .* matches abc/def . Afterwards \\/ tries to match a forward slash and succeeds. The matching ends here.

No, not quite.

  • ^ : beginning of a line
  • \\/ : escaped slash (escape character is \\ alone)
  • ^.*\\/ : everything from beginning of a line to the last occurrence of / in the string

.* depends on the mode of the regex. In singleline mode (ie, without m option), it means the longest possible sequence of zero or more non-newline characters. In multiline mode (ie, with m option), it means the longest possible sequence of zero or more characters.

Your understanding is correct, but you should also note that the last statement is true because:

Repetition is greedy by default: as many occurrences as possible 
are matched while still allowing the overall match to succeed. 

Quoted from the Regexp documentation.

Yes. In short, it matches any number of any characters ( .* ) ending with a literal / ( \\/ ).

gsub replaces the match with the second argument (empty string '' ).

Nothing wrong with your regex, but File.basename(str) might be more appropriate.

To expound on what @Stefen said: It really looks like you're dealing with a file path, and that makes your question an XY problem where you're asking about Y when you should ask about X: Rather than how to use and understand a regex, the question should be what tool is used to manage paths.

Instead of rolling your own code, use code already written that comes with the language:

str = "abc/def/ghi.rb"
File.basename(str) # => "ghi.rb"
File.dirname(str) # => "abc/def"
File.split(str) # => ["abc/def", "ghi.rb"]

The reason you want to take advantage of File's built-in code is it takes into account the difference between directory delimiters in *nix-style OSes and Windows. At start-up, Ruby checks the OS and sets the File::SEPARATOR constant to what the OS needs:

File::SEPARATOR # => "/"

If your code moves from one system to another it will continue working if you use the built-in methods, whereas using a regex will immediately break because the delimiter will be wrong.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM