简体   繁体   中英

RegEx for matching a string that contains file names and their extension

I have the following RegEx:

(^[\w]+[\w|\s-]*\.[A-Za-z0-9]+$)(,|;\s*^[\w]+[\w|\s-]*\.[A-Za-z0-9]+$)*

Where,

^[\w]+[\w|\s-]*\.[A-Za-z0-9]+$

matches any string containing for example:

"someFile.txt"

This works as expected: No whitespace at start, must have an extension (extension irrelevant).

With the first RegEx I want to match strings that are separated by a comma (,) or a semicolon (;) and a whitespace thereafter, such as:

"someFile.txt, oneMoreFile.bat, anotherFile.doc"

 or

"someFile.txt; oneMoreFile.bat; anotherFile.doc"

The idea was to match one file and then 0..n more files thereafter.

Now the problem arises that I cannot find the issue with the first RegEx. I've gone through the expression numerous times and have not been able to spot the error. I've put it in RegEx debugging tools and have looked at the RegEx explanation, they all make sense, but don't work.

Edit: I forgot to mention that the first RegEx matches if one file is in string (no comma or semicolon).

Your current pattern, while on the right track, does not seem to work. Rather than pointing out a few problems, I would suggest this regex pattern:

^\w[\w\s]*\.[A-Za-z0-9]+(?:[,;]\s*\w[\w\s]*\.[A-Za-z0-9]+)*$

To make an explanation simpler, let's assume that filenames just contain word characters ( \\w ) for both the filename and extension. Then, we could write the following simplified pattern:

^\w+\.\w+(?:[,;]\s*\w+\.\w+)*$

This says to match:

^         from the start of the string
\w+       an initial filename
\.        a dot
\w+       an initial extension
(?:       (do not capture quantity in parentheses)
    [,;]  a comma or semicolon separator
    \s*   optional whitespace in between previous and current filename
    \w+   a subsequent filename
    \.    a dot
    \w+   a subsequent extension
)*        zero or more such extra filenames
$         end of the string

Your regex has some issues about where you put the end of input and the alternation of , and ;.

You can try this smaller version: /^([\\w]+[\\w|\\s-]*\\.[A-Za-z\\d]+((,\\s*)(?!$)|$))*$/

This (,\\s*)(?!$)|$) says that a comma can be after the main expression only if it is not followed by the end of input.

Demo: https://regex101.com/r/pDTxJh/3

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM