简体   繁体   中英

Validate the contents of uploaded files

I'm developing a "plug 'n play" system in which individual components can registered and associated with an uploaded file using the Application GUI.

But to be really "plug 'n play" the Application must recognize the component and since each component is a class I could accomplish this by using interfaces.

But how can I validate the contents of an uploaded file searching for an specific interface?

My first thought was to use Tokenizer but this proved to me harder than I expected. A simple test component file like this:

<?php

class ValidComponent implements Serializable {

    public serialize() {}
    public unserialize( $serialized ) {}
}

After passed by token_get_all() resulted in:

Array
(
    [0] => Array
        (
            [0] => T_OPEN_TAG
            [1] => <?php

            [2] => 1
        )

    [1] => Array
        (
            [0] => T_WHITESPACE
            [1] => 

            [2] => 2
        )

    [2] => Array
        (
            [0] => T_CLASS
            [1] => class
            [2] => 3
        )

    [3] => Array
        (
            [0] => T_WHITESPACE
            [1] =>  
            [2] => 3
        )

    [4] => Array
        (
            [0] => T_STRING
            [1] => ValidComponent
            [2] => 3
        )

    [5] => Array
        (
            [0] => T_WHITESPACE
            [1] =>  
            [2] => 3
        )

    [6] => Array
        (
            [0] => T_IMPLEMENTS
            [1] => implements
            [2] => 3
        )

    [7] => Array
        (
            [0] => T_WHITESPACE
            [1] =>  
            [2] => 3
        )

    [8] => Array
        (
            [0] => T_STRING
            [1] => Serializable
            [2] => 3
        )

    [9] => Array
        (
            [0] => T_WHITESPACE
            [1] =>  
            [2] => 3
        )

    [10] => U
    [11] => Array
        (
            [0] => T_WHITESPACE
            [1] => 


            [2] => 3
        )

    [12] => Array
        (
            [0] => T_PUBLIC
            [1] => public
            [2] => 5
        )

    [13] => Array
        (
            [0] => T_WHITESPACE
            [1] =>  
            [2] => 5
        )

    [14] => Array
        (
            [0] => T_STRING
            [1] => serialize
            [2] => 5
        )

    [15] => U
    [16] => U
    [17] => Array
        (
            [0] => T_WHITESPACE
            [1] =>  
            [2] => 5
        )

    [18] => U
    [19] => U
    [20] => Array
        (
            [0] => T_WHITESPACE
            [1] => 

            [2] => 5
        )

    [21] => Array
        (
            [0] => T_PUBLIC
            [1] => public
            [2] => 6
        )

    [22] => Array
        (
            [0] => T_WHITESPACE
            [1] =>  
            [2] => 6
        )

    [23] => Array
        (
            [0] => T_STRING
            [1] => unserialize
            [2] => 6
        )

    [24] => U
    [25] => Array
        (
            [0] => T_WHITESPACE
            [1] =>  
            [2] => 6
        )

    [26] => Array
        (
            [0] => T_VARIABLE
            [1] => $serialized
            [2] => 6
        )

    [27] => Array
        (
            [0] => T_WHITESPACE
            [1] =>  
            [2] => 6
        )

    [28] => U
    [29] => Array
        (
            [0] => T_WHITESPACE
            [1] =>  
            [2] => 6
        )

    [30] => U
    [31] => U
    [32] => Array
        (
            [0] => T_WHITESPACE
            [1] => 

            [2] => 6
        )

    [33] => U
)

Not only this is not very efficient because real components might be much bigger and result in huge arrays but I don't think it's very trustable.

I could certainly use this structure and search it recursively, looking for the name of some specific interface but this would certainly give me some false-positive if this interface name appears in anywhere of the code (comments, regular strings...).

I would like to avoid text comparison or Regular Expressions, if possible, but I don't know if it's possible to create a isolated sandbox to evaluate the uploaded file in order to use Reflection.

DISCLAIMER:

So you want to build a "system" where users can upload PHP files than, in turn, will be used by said system?

Unless you completely trust the users or is used in a context where the system trusts the uploader 100%, like in a development environment, this is EXTREMELY insecure ...


Parsing the file yourself with Tokenizer:

That being said, the best and probably the only MILDLY SANE way to analyze a php file without running it is with tokenizer .

For instance, if you only wish to know if that file contains a class that implements a predetermined interface:

$source = file_get_contents('file.php');
$tokens = token_get_all($source);

function startsWithOpenTag($tokens)
{
    return ($tokens[0][0] === T_OPEN_TAG);
}

function searchForInterface($tokens, $interfaceName)
{
    $i = 0;
    foreach ($tokens as $tk) {
        if (isset($tk[1]) && strtolower($tk[1]) === 'implements') {
            for ($ii = $i; $ii < count($tokens); ++$ii) {
                if ($tokens[$ii] === '{') {
                    break;
                } else {
                    if (isset($tokens[$ii][1]) && $tokens[$ii][2] === $interfaceName) {
                        return true;
                    }
                }
            }
        }
        ++$i;
    }
    return false;
}

var_dump(startsWithOpenTag($tokens));
var_dump(searchForInterface($tokens, 'Serializable'));

this is enough. However, this does not mean there aren't any parse errors in the file (or any logic errors). In fact, short of building a complete PHP Parser yourself (which is kind of INSANE), the only way to know for sure a file is valid is running it.


Creating a running sandbox:

The best way to accomplish what you want is probably creating a PHP sandbox. You can do this by starting another PHP process/thread.

With Runkit:

Runkit is an extension that provides means to modify constants, user-defined functions, and user-defined classes. It also provides for custom superglobal variables and embeddable sub-interpreters via sandboxing.

Runkit_Sandbox class creates a new thread with its own scope and program stack. Using a set of options passed to the constructor, this environment may be restricted to a subset of what the primary interpreter can do and provide a safer environment for executing user supplied code.

With pure PHP:

You can create a sort of "sandbox" by opening another PHP process with proc_open or exec , for instance, that has the sandbox logic and is responsible for parsing and testing the uploaded file.

In this example we create 3 files:

  • main.php is your ApplicationGui . It's responsible for getting the file, the correct parameters and options and then starting the new PHP process.
  • sandbox.php is your sandbox script . It will include/require the file you want to "test" and use reflection to teste the class.
  • file.php is the uploaded file you want to test (I used your example, but with a valid class)

Have a look at Symfony Console and Symfony Config components that may help accomplish this.


main.php

$sandBoxWrapperPath = realpath('sandbox.php');
$uploadedFile = realpath('file.php');
$className = "\ValidComponent";

$command = "php \"$sandBoxWrapperPath\" -f \"$uploadedFile\" -c \"$className\"";

$descriptorspec = array(
   1 => array("pipe", "w"), // STDOUT
   2 => array("pipe", "w")  // STDERR
);

$phpSandBox = proc_open($command, $descriptorspec, $pipes);


if (is_resource($phpSandBox)) {

    $stdOut = stream_get_contents($pipes[1]);
    fclose($pipes[1]);

    $stdErr = stream_get_contents($pipes[2]);
    fclose($pipes[2]);

    $exitCode = proc_close($phpSandBox);


    echo "STDOUT: " . $stdOut . PHP_EOL . PHP_EOL;
    echo "STDERR: " . $stdErr . PHP_EOL . PHP_EOL;
}

sandbox.php

$shortopts  = "";
$shortopts .= "f:";  // Uploaded File
$shortopts .= "c:";  // Name of the class, with namespace

$opts = getopt($shortopts);

if (!isset($opts['f'])) {
    exit('File parameter is required');
}

// Instead, you can use tokenizer to pre parse the file. 
// For instance, you can find class name this way
if (!isset($opts['c'])) {
    exit('Class parameter is required');
}

$file = $opts['f'];
$className = $opts['c'];

require $file;
$refClass = new ReflectionClass($className);

//Do stuff with reflection

Extra:

There are a couple of PHP sandboxes on github:

The projects don't seem very active though...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM