I'm looking for a regexp formula that finds strings starting with a dash and ending with a dash or a point, it order to manually evaluate cases where dashes must be replaced with em-dashes.
For example, the below text:
-hi there. -hello-. It's nice -said while looking at the window- if you could come.
Needs to be replaced with
—hi there. —hello—. —good morning —he said.
But this dashes must remain unchanged:
1992-1994 MTS-O
Since I dont think a fully automated solution is posible, i'm looking to speed up the manual review with a single regexp that replaces these two:
–(.+?)– –(.+?)\\.
With one that match a dash or a point at the end, and let me do a fast substitution that conditionally replace the en dash, when that is matched or keeps the point, if thats matched.
Maybe you can settle with a simple pattern as suggested. But that might cause problems with some edge-cases. It needs a little more to fulfill all your requirements.
..a regexp formula that finds a string starting with a dash and ending with a dash or a point ,
However, if you want to do it in one go you may need a PCRE pattern like this: Demo
(?=^-.*[.-]$)-|\G(?!^).*\K-
First, verify the whole string with a lookahead: (?=^-.*[.-]$)
. If we've a match we are at position 1.
Then, we look for the first dash to replace it, followed by a \\G-continue alternative to match subsequent dashes that are not at the starting position (?!^)
. We skip ahead to the next -
with .*
and use \\K
to drop everything before it. Fun, right?
In general, I would suggest using two regexes. First to find/verify the pattern in question, and then do the replacement. But that is probably not an option in your environment.
My guess is that, maybe these simple expressions,
(?=-)-
or more accurately for ending with .
:
(?=-.*\.$)-
with a simple replacement of —
might work.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.