简体   繁体   中英

How do I select from `|` to `|`?

How can I select the text from | to | ? For example:

I have to select | this part | and not this

I tried with (^|\\>|\\s)\\|(\\S+) , but in this way it selects just the first word.

I have to select all the characters between the first | and the second | . Do you have any suggestions on how can I achieve this?

You can use this regex, and capture the contents from group1

\|([^|]*)\|

Here, | is a meta character hence it needs escaping. You start the pattern by matching a | then capture any characters other than | zero or more times and capture it in group1 and then further again match a | and get your contents from first grouping pattern.

Regex Demo

Try \\|(.*?)\\| . The question mark makes this a non-greedy expression.

Try using this:

\|(.*?[^\|])\|

Which can select everything except for a new line and the pipe (which has back slash before it).

The other answers are great if you only have a single pair of | , but what if you have multiple instances that you want to match? For example:

| one | two | three | four | five |

In the example above, there are five possible strings that are between two | 's. Any of the answers above will only match one , three and five , and will not match two or four .

At this point you may be wondering: why? The answer is simple: The regex engine can not match the same text twice.

Consider what happens when it matches | one | | one | , for example: Because the | after one has already matched, and it can't be matched again, the remaining text that is available for matching is:

 two | three | four | five |

Note the lack of a | before two . In this remaining text, two is clearly not a match, and so the string | three | | three | is actually the next match. The same will happen with four .

What you need is a way to check for the presence of | , but not include it in the match. This can be achieved using lookaheads and lookbehinds . Now, this will depend on the flavor of regex you're using actually providing these constructs, so you mileage may vary.

This is what a positive lookbehind looks like:

(?<=insert_expression_here)

It will try to match whatever expression you put there, ending the match exactly at the current position in the original expression.

A positive lookahead does kind of the opposite:

(?=insert_expression_here)

It will try to match whatever expression you put there, starting the match exactly at the current position in the original expression.

Knowing this, it becomes clear that we must check for | at the start and at the end of the match, using both a lookbehind ( (?<=\\|) ) at the start, and a lookahead ( (?=\\|) ) at the end.

This is what the final expression looks like:

(?<=\|).*?(?=\|)

See it live!

There's no need for a capture: The only text that will match is the text you're interested in. Also, note that we are using a lazy expression : basically, instead of trying to match as many characters as possible (the default behavior), which would match the whole string, we want to match as few characters as possible. This will ensure that there are no stray | characters inside your match.

And here is a great tutorial if you want to learn more about lookaheads and lookbehinds. Learning about them will not only give you more options when constructing regular expressions, but also give you a better insight on how the regex engine works.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM