I have 3 strings in this format
Bank: {"955974044748481":["BANK_A"]}
{"reason": "Bank: {"455049295219902":["BANK_B"]}"}
{"reason": "Bank: {\\"1876212592475597\\":[\\"BANK_C\\"]}"}
I need to extract the bank_id
and bank_name
from these strings using a single regex in a presto SQL statement.
I have tried this regex but it only captures the first two and not the last one which has escape characters. https://regex101.com/r/ejW68x/1
Bank: {"(.*)":\["(.*)"\]}
What's the right way to capture all 3 variations?
How about something like this:
Bank:.*{(?:\\\\)?"([^{"]*?)(?:\\\\)?":\[(?:\\\\)?"(.*?)(?:\\\\)?"\]}
Or to make sure the \\
are only matched in pairs:
Bank:.*{((?:\\\\)?)"([^{"]*?)\1":\[((?:\\\\)?)"(.*?)\3"\]}
Note that in the second case, your captures will be in groups #2 and #4.
Your new test strings would still be matched by the above patterns. You may just replace Bank:.*
with Bank:[ ]
if you like. Demo1 - Demo2 .
Added (?:\\\\)?
--> An optional non-capturing group to match the two backslash characters.
Replaced your first capturing group (.*)
with ([^{"]*?)
to avoid matching double-quote and {
characters (this is especially necessary for your first test strings). Also, converted it from greedy to lazy (by adding ?
) to avoid capturing the escaping characters ( \\
) if present.
Made the second capturing group lazy as well (.*?)
for the same reason.
In the second pattern, (?:\\\\)?
was added to a capturing group so that a backreference can be used (ie, \1
and \3
). The purpose of this is to only match if both the double-quote characters are escaped (preceded by \\
).
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.