I'm creating piece of code to check mp3 files on my server and get result do some of them have false sync or not. In short, I'm loading those files in PHP using fread() function and getting stream in variable. After splitting that stream to get separate streams for id3v1 (not necessary, it's not a subject of sync), id3v2 (main problem) and audio, I have to implement that scheme against id3v2 stream.
The only purpose of the 'unsynchronisation scheme' is to make the ID3v2 tag as compatible as possible with existing software. There is no use in 'unsynchronising' tags if the file is only to be processed by new software. Unsynchronisation may only be made with MPEG 2 layer I, II and III and MPEG 2.5 files.
Whenever a false synchronisation is found within the tag, one zeroed byte is inserted after the first false synchronisation byte. The format of a correct sync that should be altered by ID3 encoders is as follows:
%11111111 111xxxxx
And should be replaced with:
%11111111 00000000 111xxxxx
This has the side effect that all $FF 00 combinations have to be altered, so they won't be affected by the decoding process. Therefore all the $FF 00 combinations have to be replaced with the $FF 00 00 combination during the unsynchronisation.
To indicate usage of the unsynchronisation, the first bit in 'ID3 flags' should be set (note: I've found that bit). This bit should only be set if the tag contains a, now corrected, false synchronisation. The bit should only be clear if the tag does not contain any false synchronisations.
Do bear in mind, that if a compression scheme is used by the encoder, the unsynchronisation scheme should be applied afterwards . When decoding a compressed, 'unsynchronised' file, the 'unsynchronisation scheme' should be parsed first, decompression afterwards.
%11111111 111xxxxx
with %11111111 00000000 111xxxxx
? %11111111 00000000 111xxxxx
with %11111111 111xxxxx
? ...using preg_replace() .
Code I've created so far works perfectly and I have just one line more (well, two exactly).
<?php
// some basic checkings here, such as 'does file exist'
// and 'is it readable'
$f = fopen('test.mp3', 'r');
// ...rest of my code...
$pattern1 = '?????'; // pattern from 1st question
$id3stream = preg_replace($pattern1, 'something1', $id3stream);
// ...extracting frames...
$pattern1 = '?????'; // pattern from 2nd question
$id3stream = preg_replace($pattern2, 'something2', $id3stream);
// ..do more job...
fclose($f);
?>
How to make those two lines with preg_replace() function work?
PS I know how to do it reading byte after byte in some kind of loop, but I'm sure this is possible using regular expressions (btw, to be honest, I suck in regex).
Let me know If you need more details.
At the moment I'm using this pattern
$pattern0 = '/[\x00].*/';
echo preg_replace($pattern0, '', $input_string);
to cut off part of string starting at first zero-byte until the end. Is that correct way for doing this?
( @mario's answer ).
In first couple of tests... this code has returned correct result.
// print original stream
printStreamHex($stream_original, 'ORIGINAL STREAM');
// adding zero pads on unsync scheme
$stream_1 = preg_replace(':([\\xFF])([\\xE0-\\xFF]):', "$1\x00$2", $stream_original);
printStreamHex($stream_1, 'AFTER ADDING ZEROS');
// reversing process
$stream_2 = preg_replace(':([\\xFF])([\\x00])([\\xE0-\\xFF]):', "$1$3", $stream_1);
printStreamHex($stream_2, 'AFTER REMOVING ZEROS');
echo "Status: <b>" . ($stream_original == $stream_2 ? "OK" : "Failed") . "</b>";
But minutes after, I've found specific case where everything looks like expected result but there are still FFE0+ pairs in the stream.
ORIGINAL STREAM
+-----------------------------------------------------------------+
| FF E0 DB 49 53 BE 3B E0 90 40 EA 2B 3A 61 FF FA |
| 84 E0 A9 99 1F 39 B5 E1 54 FF E7 ED B8 B1 3A 36 |
| 88 01 69 CA 7D 47 FA E1 70 7C 85 34 B8 1A FF FF |
| FF F8 21 F9 2F FF F7 17 67 EB 2A EB 6E 41 82 FF |
+-----------------------------------------------------------------+
AFTER ADDING ZEROS
+-----------------------------------------------------------------+
| FF 00 E0 DB 49 53 BE 3B E0 90 40 EA 2B 3A 61 FF |
| 00 FA 84 E0 A9 99 1F 39 B5 E1 54 FF 00 E7 ED B8 |
| B1 3A 36 88 01 69 CA 7D 47 FA E1 70 7C 85 34 B8 |
| 1A FF 00 FF FF 00 F8 21 F9 2F FF 00 F7 17 67 EB |
| 2A EB 6E 41 82 FF |
+-----------------------------------------------------------------+
AFTER REMOVING ZEROS
+-----------------------------------------------------------------+
| FF E0 DB 49 53 BE 3B E0 90 40 EA 2B 3A 61 FF FA |
| 84 E0 A9 99 1F 39 B5 E1 54 FF E7 ED B8 B1 3A 36 |
| 88 01 69 CA 7D 47 FA E1 70 7C 85 34 B8 1A FF FF |
| FF F8 21 F9 2F FF F7 17 67 EB 2A EB 6E 41 82 FF |
+-----------------------------------------------------------------+
Status: OK
If stream contains something like FF FF FF FF
it will be replaced with FF 00 FF FF 00 FF
, but it should be FF 00 FF 00 FF 00 FF
. That FF FF
pair will false mp3 synchronisation again so my mission is to avoid every FFE0+
pattern before audio stream (in ID3v2 tag-stream; because mp3 starts with FFE0+
byte-pair and it should be first occurrence at the beginning of audio data). I figured out that I can loop same regex until I got stream without FFE0+ byte-pair. Is there any solution that doesn't require loop?
Great job @mario, thanks a lot!
Binary strings are not quite the turf of regular expressions. But you already had the right approach with using \\x00
.
3.. to cut off part of string starting at first zero-byte until the end
$pattern0 = '/[\\x00].*$/';
You were just missing the $
here.
1.. How to search & replace this bit-pattern
%11111111 111xxxxx
with%11111111 00000000 111xxxxx
?
Use the the sequence FF
and E0
for these bit-strings.
preg_replace(':([\\xFF])([\\xE0-\\xFF]):', "$1\x00$2");
Using the $2 here in the replacement string, since you search for a variable byte. Otherwise a simpler str_replace would work.
2.. Vice versa, how to search & replace this bit-pattern
%11111111 00000000 111xxxxx
with%11111111 111xxxxx
?
Same trick.
preg_replace(':([\\xFF])([\\x00])([\\xE0-\\xFF]):', "$1$3");
I would only watch out to always use the \\ double backslash, so it is PCRE which interpretets the \\x00
hex sequences, not the PHP parser. (It would end up becoming a C string terminator before it reaches libpcre.)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.