简体   繁体   中英

In regular expression, how match two diferent cases of string

I am trying to get strings from a list of files with ls command. I have this two cases:

"filename"
"link File" -> "filename"

In python, I did this code:

print(re.findall( r'"(.*?)"', linha))

The RE i did:

"(.*?)"               -: match ['filename']                CORRECT
                               ['link File" -> "filename'] WRONG
"(.*?)" -> "(.*?)"    -: match ['']                        WRONG
                               ['link File', 'filename']   CORRECT

What is the RE to get this result in the same RE:

                      -: match ['filename', '']            CORRECT
                               ['link File', 'filename']   CORRECT

You have an optional section, so use a ? to match it if it is there. Next, you want to exclude " from your matches, since your targets are surrounded by quotes. This makes it easier for the regex engine to match your string:

"([^"]*)"(?: -> "([^"]*)")?

The (?:...) grouping is non-capturing, the ? after it makes it optional.

When you use this with re.findall() , you'll always get tuples with two groups, the second one being empty for those inputs where -> "..." is missing:

>>> import re
>>> re.findall(r'"([^"]*)"(?: -> "([^"]*)")?', '"filename"')
[('filename', '')]
>>> re.findall(r'"([^"]*)"(?: -> "([^"]*)")?', '"link File" -> "filename"')
[('link File', 'filename')]

I've created an online demonstration with Regex101 (which, for some reason, requires us to explicitly escape double quotes, not something that Python actually would require). It contains a breakdown of the pattern on the right-hand side under the 'Explanation' banner.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM