简体   繁体   中英

Capture strings inside escaped quotes

I have 3 strings in this format

Bank: {"955974044748481":["BANK_A"]}
{"reason": "Bank: {"455049295219902":["BANK_B"]}"}
{"reason": "Bank: {\\"1876212592475597\\":[\\"BANK_C\\"]}"}

I need to extract the bank_id and bank_name from these strings using a single regex in a presto SQL statement.

I have tried this regex but it only captures the first two and not the last one which has escape characters. https://regex101.com/r/ejW68x/1

Bank: {"(.*)":\["(.*)"\]}

What's the right way to capture all 3 variations?

How about something like this:

Bank:.*{(?:\\\\)?"([^{"]*?)(?:\\\\)?":\[(?:\\\\)?"(.*?)(?:\\\\)?"\]}

Demo .

Or to make sure the \\ are only matched in pairs:

Bank:.*{((?:\\\\)?)"([^{"]*?)\1":\[((?:\\\\)?)"(.*?)\3"\]}

Demo .

Note that in the second case, your captures will be in groups #2 and #4.


Update:

Your new test strings would still be matched by the above patterns. You may just replace Bank:.* with Bank:[ ] if you like. Demo1 - Demo2 .

Explanaion: (changes to your pattern)

  • Added (?:\\\\)? --> An optional non-capturing group to match the two backslash characters.

  • Replaced your first capturing group (.*) with ([^{"]*?) to avoid matching double-quote and { characters (this is especially necessary for your first test strings). Also, converted it from greedy to lazy (by adding ? ) to avoid capturing the escaping characters ( \\ ) if present.

  • Made the second capturing group lazy as well (.*?) for the same reason.

  • In the second pattern, (?:\\\\)? was added to a capturing group so that a backreference can be used (ie, \1 and \3 ). The purpose of this is to only match if both the double-quote characters are escaped (preceded by \\ ).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM