Breaking up the below code to understand my regex and gsub
understanding:
str = "abc/def/ghi.rb"
str = str.gsub(/^.*\//, '')
#str = ghi.rb
^
: beginning of the string
\\/
: escape character for /
^.*\\/
: everything from beginning to the last occurrence of /
in the string
Is my understanding of the expression right?
How does .*
work exactly?
Your general understanding is correct. The entire regex will match abc/def/
and String#gsub
will replace it with empty string.
However, note that String#gsub
doesn't change the string in place. This means that str
will contain the original value( "abc/def/ghi.rb"
) after the substitution. To change it in place, you can use String#gsub!
.
.*
works - the algorithm the regex engine uses is called backtracking .
Since .*
is greedy (will try to match as many characters as possible), you can think that something like this will happen:
Step 1 :
.*
matches the entire stringabc/def/ghi.rb
. Afterwards\\/
tries to match a forward slash, but fails (nothing is left to match)..*
has to backtrack.
Step 2 :.*
matches the entire string except the last character -abc/def/ghi.r
. Afterwards\\/
tries to match a forward slash, but fails (/ != b
)..*
has to backtrack.
Step 3 :.*
matches the entire string except the last two characters -abc/def/ghi.
. Afterwards\\/
tries to match a forward slash, but fails (/ != r
)..*
has to backtrack.
...
Step n :.*
matchesabc/def
. Afterwards\\/
tries to match a forward slash and succeeds. The matching ends here.
No, not quite.
^
: beginning of a line \\/
: escaped slash (escape character is \\
alone) ^.*\\/
: everything from beginning of a line to the last occurrence of /
in the string .*
depends on the mode of the regex. In singleline mode (ie, without m
option), it means the longest possible sequence of zero or more non-newline characters. In multiline mode (ie, with m
option), it means the longest possible sequence of zero or more characters.
Your understanding is correct, but you should also note that the last statement is true because:
Repetition is greedy by default: as many occurrences as possible
are matched while still allowing the overall match to succeed.
Quoted from the Regexp documentation.
Yes. In short, it matches any number of any characters ( .*
) ending with a literal /
( \\/
).
gsub
replaces the match with the second argument (empty string ''
).
Nothing wrong with your regex, but File.basename(str) might be more appropriate.
To expound on what @Stefen said: It really looks like you're dealing with a file path, and that makes your question an XY problem where you're asking about Y when you should ask about X: Rather than how to use and understand a regex, the question should be what tool is used to manage paths.
Instead of rolling your own code, use code already written that comes with the language:
str = "abc/def/ghi.rb"
File.basename(str) # => "ghi.rb"
File.dirname(str) # => "abc/def"
File.split(str) # => ["abc/def", "ghi.rb"]
The reason you want to take advantage of File's built-in code is it takes into account the difference between directory delimiters in *nix-style OSes and Windows. At start-up, Ruby checks the OS and sets the File::SEPARATOR
constant to what the OS needs:
File::SEPARATOR # => "/"
If your code moves from one system to another it will continue working if you use the built-in methods, whereas using a regex will immediately break because the delimiter will be wrong.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.