简体   繁体   中英

Regex to get folders and file name parts of a path

I'm trying to parse a partial URL path into 3 groups.

  • Group 1 - Jobs or Docs.
  • Group 2 - The folder path (or empty if the file is directly under Jobs or Docs).
  • Group 3 - The file name (or empty if only a folder path is specified).

eg

  • /Jobs/STU0001/Folder1/Sub Folder A/File Name.txt - should match on all groups.
  • /Docs/Folder 2 - should match on groups 1 and 2.
  • /Docs/Another File.doc - should match on groups 1 and 3.

I've tried the following (and other slight variations of the same) but just can't get one pattern to suit all the possible inputs.

^/?(Jobs|Docs)/(.*)/(.+\\..+)?$ - Works for 1, not 2 or 3

^/?(Jobs|Docs)/(.*)/?(.+\\..+)?$ - Works for 2, not 1 or 3

For info:

  • File names will always have an extension (and therefore a full stop/period).
  • Folder names will never have a full stop/period in their name.

Regex : \\/?(Jobs|Docs)(?:\\/(.+)(?=\\/|$))?(?:\\/?([^\\.]+\\.[az]+))?

Output :

Match 1
Full match  0-48    `/Jobs/STU0001/Folder1/Sub Folder A/File Name.txt`
Group 1.    1-5     `Jobs`
Group 2.    6-34    `STU0001/Folder1/Sub Folder A`
Group 3.    35-48   `File Name.txt`

Match 2
Full match  49-71   `/Docs/Another File.doc`
Group 1.    50-54   `Docs`
Group 3.    55-71   `Another File.doc`

Match 3
Full match  72-86   `/Docs/Folder 2`
Group 1.    73-77   `Docs`
Group 2.    78-86   `Folder 2`

Regex demo

Another one:

^/(Docs|Jobs)(?:/([^.\n]*))?(?:/([^/\n]+\.[^/\n]+))?$

Broken apart:

^ Start of line

/ Initial slash

(Docs|Jobs) Captures the first directory

(?:/([^.\\n]*))? Matches a slash and captures the folder part.

(?:/([^/\\n]+\\.[^/\\n]+))? Matches a slash and captures the filename part.

$ End of string

The directory part can basically contain anything excepts periods and line feeds.

The filename part must contain three parts - 1) a filename not containing slashes or line feeds, 2) a period, and 3) an extension not containing slashes or line feeds.

Both are optional.

See it here at regexstorm .

See a better visual illustration here at regex101 .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM