Javascript Regex multi-line base64

Question

I have the following from a MIME message;

--------------ra650umTsDNeI5lwXmFy5luF
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: base64

TG9yZW0gSXBzdW0NCg0KSGVyZSBpcyBzb21lIG1vcmUgdGV4dA0KDQpOb3cgb24gYSAzcmQg
bGluZQ0KDQoNClRoYW5rcw0KDQo=

--------------ra650umTsDNeI5lwXmFy5luF--

I want to extract the base64 encoded message, regardless of how many lines it is.

The following will indeed find matches on each individual line, but how can I group them so that if there are multiple lines of base64 that matches, it will group them as "together"

var base64Regex = /^(?:[A-Za-z0-9+\/]{4})*(?:[A-Za-z0-9+\/]{4}|[A-Za-z0-9+\/]{3}=|[A-Za-z0-9+\/]{2}={2})$/gm

When the MIME content for example also contains a PGP signature, this would give me 4 or 5 matches, so I can't simply join them, because it will find that base64 as well.

Ideally I'd modify this so it gets everything from/including the first match to ---------- and says that is "match 1" and if it finds another block of base64, that is "match 2", etc.

Here is a link to regex101 showing 2 matches. In short, I would like for this to be one match.

https://regex101.com/r/32WjKa/1

Answer 1

Would this help?

var base64Regex = /Content-Transfer-Encoding: base64([\s\S]*?)\s*?--/g;

Content-Transfer-Encoding: base64 - This is the start of the base64 encoded message.

[\s\S]*? - This is the base64 encoded message. It can be on multiple lines.

\s*? -- \s*? -- - This is the end of the base64 encoded message.

g - This is the global flag, so that it will match all instances of the regex

Answer 2

Instead of looking for base64 characters, I'd look for all characters (including newlines) between the start and end of the HTTP payload.

By default, . in Javascript regexes, even in mulit-line mode, won't match linebreaks. But the /s flag allows for . to match linebreaks.

With this method, you can remove linebreaks after you match with a simple replace()

const str = `--------------ra650umTsDNeI5lwXmFy5luF
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: base64

TG9yZW0gSXBzdW0NCg0KSGVyZSBpcyBzb21lIG1vcmUgdGV4dA0KDQpOb3cgb24gYSAzcmQg
bGluZQ0KDQoNClRoYW5rcw0KDQo=

--------------ra650umTsDNeI5lwXmFy5luF--`

const payload = str.match(/base64\n\n(.+)\n\n--------------.+/ms)[1].replace(/\n/g, '')

You might also be better off using something like body-parser since HTTP payloads like this are standard.

Answer 3

Here are two solutions, one using a regex .replace() , the other one using a .match() with positive lookbehind and positive lookahead:

 const input = `--------------ra650umTsDNeI5lwXmFy5luF Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: base64 TG9yZW0gSXBzdW0NCg0KSGVyZSBpcyBzb21lIG1vcmUgdGV4dA0KDQpOb3cgb24gYSAzcmQg bGluZQ0KDQoNClRoYW5rcw0KDQo= --------------ra650umTsDNeI5lwXmFy5luF--`; const regex1 = /^.*?Content-Transfer-Encoding: base64\s+(.*?)\s*---.*$/is; let result1 = input.replace(regex1, '$1'); console.log(result1); const regex2 = /(?<=Content-Transfer-Encoding: base64\s+).*?(?=\s*---)/is; let result2 = input.match(regex2); console.log(result2[0]);

Output:

TG9yZW0gSXBzdW0NCg0KSGVyZSBpcyBzb21lIG1vcmUgdGV4dA0KDQpOb3cgb24gYSAzcmQg
bGluZQ0KDQoNClRoYW5rcw0KDQo=

TG9yZW0gSXBzdW0NCg0KSGVyZSBpcyBzb21lIG1vcmUgdGV4dA0KDQpOb3cgb24gYSAzcmQg
bGluZQ0KDQoNClRoYW5rcw0KDQo=

Explanation of regex 1 for .replace() :

^ -- anchor at start of string
.*?Content-Transfer-Encoding: base64\s+ -- literal text up to base 64 , and including whitespace
(.*?) -- capture group one: non greedy capture all, until:
\s*---.* -- whitespace, --- , and everything after that
$ -- anchor at end of string
use is flags for case insensitive, and to match newlines with . , respectively

Explanation of regex 2 for .match() :

(?<=Content-Transfer-Encoding: base64\s+) -- positive lookbehind for literal text ...base64 , including whitespace
.*? -- non-gridy scan, until:
(?=\s*---) -- positive lookahead for whitespace and ---
use is flags for case insensitive, and to match newlines with . , respectively

Notes:

Keep in mind that not all regex flavors and browsers support lookbehind, notably Safari
It is safe to scan for --- to find the end because dashes are not part of base64 characters

Javascript Regex multi-line base64

Question

3 answers

solution1
1 2022-11-20 20:27:53

solution2
0 2022-11-20 20:28:48

solution3
0 2022-11-21 01:01:38

Javascript Regex multi-line base64

Question

3 answers

solution1 1 2022-11-20 20:27:53

solution2 0 2022-11-20 20:28:48

solution3 0 2022-11-21 01:01:38

solution1
1 2022-11-20 20:27:53

solution2
0 2022-11-20 20:28:48

solution3
0 2022-11-21 01:01:38