简体   繁体   English

内存高效的base64解码

[英]Memory-efficient base64 decoding

We're having some trouble in our application with people pasting images into our rich-text WYSIWYG, at which point they exist as base64-encoded strings. 我们在应用程序中遇到一些问题,人们将图像粘贴到我们的富文本WYSIWYG中,此时它们作为base64编码的字符串存在。 eg: 例如:

<img src="..." />

The submission form is submitted and processed just fine, but when our application is generating a page containing multiple images it can cause PHP to hit its memory limit, as well as bloating page source, etc. 提交表单提交并处理得很好,但是当我们的应用程序生成包含多个图像的页面时,它可能导致PHP达到其内存限制,以及膨胀的页面源等。

What I've done is written some code to add to our form processor to extract the embedded images, write them to a file, and then put the URL in the src attribute. 我所做的是编写一些代码添加到我们的表单处理器中以提取嵌入的图像,将它们写入文件,然后将URL放在src属性中。 The problem is that while processing an image memory usage spikes to 4x the size of the data which could potentially break the form processor as well. 问题在于,虽然处理图像内存使用量的峰值是数据大小的4倍,但也可能会破坏表单处理器。

My POC code: 我的POC代码:

<?php
function profile($label) {
    printf("%10s %11d %11d\n", $label, memory_get_usage(), memory_get_peak_usage());
}

function handleEmbedded(&$src) {
    $dom = new DOMDocument;
    $dom->loadHTML($src);
    profile('domload');
    $images = $dom->getElementsByTagName('img');
    profile('getimgs');
    foreach ($images as $image) {
        if( strpos($image->getAttribute('src'), 'data:') === 0 ) {
            $image->setAttribute('src', saneImage($image->getAttribute('src')));
        }
    }
    profile('presave');
    $src = $dom->saveHTML();
    profile('postsave');
}

function saneImage($data) {
    $type = explode('/', substr($data, 5, strpos($data, ';')-5))[1];
    $filename = generateFilename('./', 'data_', $type);
    //file_put_contents($filename, base64_decode(substr($data, strpos($data, ';')+8)));
    $fh = fopen($filename, 'w');
    stream_filter_append($fh, 'convert.base64-decode');
    fwrite($fh, substr($data, strpos($data, ';')+8));
    fclose($fh);
    profile('filesaved');
    return $filename;
}

function generateFilename($dir, $prefix, $suffix) {
    $dir = preg_replace('@/$@', '', $dir);
    do {
        $filename = sprintf("%s/%s%s.%s", $dir, $prefix, md5(mt_rand()), $suffix);
    } while( file_exists($filename) );
    return "foo.$suffix";
    return $filename;
}

profile('start');
$src = file_get_contents('derp.txt');
profile('load');
handleEmbedded($src);
profile('end');

Output: 输出:

     start      236296      243048
      load     1306264     1325312
   domload     1306640     2378768
   getimgs     1306880     2378768
 filesaved     2371080     4501168
   presave     1307264     4501168
  postsave      244152     4501168
       end      243480     4501168

As you can see the memory usage still jumps into the 4MB range while the file is saved, despite trying to shave bytes by using a stream filter. 正如您所看到的,尽管尝试使用流过滤器来削减字节数,但在保存文件时内存使用量仍会跳入4MB范围。 I think that there's some buffering happening in the background, and if I was simply transcribing between files I'd break the data into chunks, but I don't know if that is feasible/advisable in this case. 我认为在后台发生了一些缓冲,如果我只是在文件之间进行转录,我会将数据分成块,但我不知道在这种情况下这是否可行/可取。

Is there anywhere I might be able to pare down my memory usage? 有没有我可以减少我的内存使用量?


Notes: 笔记:

  • file_put_contents() and changing handleEmbedded() to not pass by reference have the same memory usage. file_put_contents()和更改handleEmbedded()以不通过引用传递具有相同的内存使用量。
  • derp.txt contains a snippet of HTML with a single base64-encoded image. derp.txt包含一段HTML,其中包含一个base64编码的图像。
  • 4MB is not the end of the world, however just yesterday someone tried to upload a 61MB JPEG so who knows what someone will put in a richtext box. 4MB不是世界末日,但就在昨天有人试图上传一个61MB的JPEG,所以谁知道有人会把它放在一个富文本框中。 :I

Props to Norbert for punching a hole in my mental block: 向Norbert打招呼,在我的心理障碍中打洞:

function saneImage($data) {
    $type = explode('/', substr($data, 5, strpos($data, ';')-5))[1];
    $filename = generateFilename('./', 'data_', $type);
    writefile($filename, $data);
    profile('filesaved');
    return $filename;
}

function writefile($filename, $data) {
    $fh = fopen($filename, 'w');
    stream_filter_append($fh, 'convert.base64-decode');
    $chunksize=12*1024;
    $offset = strpos($data, ';')+8;
    for( $i=0; $chunk=substr($data,($chunksize*$i)+$offset,$chunksize); $i++ ) {
        fwrite($fh, $chunk);
    }
    fclose($fh);
}

Output: 输出:

     start      237952      244672
      load     1307920     1327000
   domload     1308296     2380664
   getimgs     1308536     2380664
 filesaved     2372712     2400592
   presave     1308944     2400592
  postsave      245832     2400592
       end      245160     2400592

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM