简体   繁体   中英

PHP: Recursive htmlspecialchars on object

I want to establish a generic sanitizer for my data that comes from various sources. With sanitizing I mean (at this stage) applying htmlspecialchars to strings. Now, the data that comes from these sources can be anything from an object to an array to a string, all nested (and complicated), and the format is always a bit different.

So I thought of a recursive htmlspecialchars function that applies itself to arrays and objects, and only applies htmlspecialchars to strings, but how do I walk an object recursively?

Thanks.

EDIT: I think I should have mentioned this - I am actually building a RIA that relies heavily on JS and JSON for client-server communication. The only thing the server does is fetching stuff from the database and returning it to the client via JSON, in the following format:

{"stat":"ok","data":{...}}

Now as I said, data could be anything, not only coming from a DB in the form of strings, but also coming from an XML The workflow to process the JSON is as follows:

  1. Fetch data from the DB/XML (source encoding is iso-8859-1)
  2. Put them into the "data" array

  3. Recursively convert from iso-8859-1 to utf-8 using

     private function utf8_encode_deep(&$input) { if (is_string($input)) { $input = $this -> str_encode_utf8($input); } else if (is_array($input)) { foreach ($input as &$value) { $this -> utf8_encode_deep($value); } unset($value); } else if (is_object($input)) { $vars = array_keys(get_object_vars($input)); foreach ($vars as $var) { $this -> utf8_encode_deep($input -> $var); } } } 
  4. Use PHP's json_encode to convert the data into JSON

  5. Send (echo) the data to the client

  6. Render the data using JS (eg putting into a table)

And somewhere in between that, the data should be somehow sanitized (at this stage only htmlspecialchars). Now the question should have been: Where to sanitize, using what method?

You would only want to escape when outputting into HTML. And you cannot output a complete array or object into HTML, so escaping everything seems invalid.

You have one level of indirection because of your JSON output. So you cannot decide in PHP what context the data is used for - JSON is still plain text, not HTML.

So to decide whether any data inside the JSON must be escaped for HTML we must know how your Javascript is using the JSON data.

Example: If your JSON is seen as plain text, and contains something like <b>BOLD</b> , then the expected outcome when used inside any HTML is exactly this text, including the chars that look like HTML tags, but no bold typesetting. This will only happen if your Javascript client handles this test as plain text, eg it DOES NOT use innerHTML() to place it on the page, because that would activate the HTML tags, but only innerText() or textContent() or any other convenience method in eg jQuery ( .text() ).

If on the other hand you expect the JSON to include readymade HTML that is fed into innerHTML() , then you have to escape this string before it is put into JSON. BUT you must escape the whole string only if you do not want to add any formatting to it. Otherwise you are in a situation that uses templates for mixing predefined formatting with user content: The user content has to be escaped when put into HTML context, but the result must not - otherwise Javascript cannot put it into innerHTML() and enable the formatting.

Basically a global escaping for everything inside your array or object most likely is wrong, unless you know for everything that it will be used in a HTML context by your Javascript.

You can try the following

class MyClass {
    public $var1 = '<b>value 1</b>';
    public $var2 = '<b>value 2</b>';
    public $var3 = array('<b>value 3</b>');
}

$list = array();
$list[0]['nice'] = range("A", "C");
$list[0]['bad'] = array("<div>A</div>","<div>B</div>","<div>C</div>",new MyClass());
$list["<b>gloo</b>"] = array(new MyClass(),"<b>WOW</b>");

var_dump(__htmlspecialchars($list));

Function Used

function __htmlspecialchars($data) {
    if (is_array($data)) {
        foreach ( $data as $key => $value ) {
            $data[htmlspecialchars($key)] = __htmlspecialchars($value);
        }
    } else if (is_object($data)) {
        $values = get_class_vars(get_class($data));
        foreach ( $values as $key => $value ) {
            $data->{htmlspecialchars($key)} = __htmlspecialchars($value);
        }
    } else {
        $data = htmlspecialchars($data);
    }
    return $data;
}

Output Something like

array
  0 => 
    array
      'nice' => 
        array
          0 => string 'A' (length=1)
          1 => string 'B' (length=1)
          2 => string 'C' (length=1)
      'bad' => 
        array
          0 => string '&lt;div&gt;A&lt;/div&gt;' (length=24)
          1 => string '&lt;div&gt;B&lt;/div&gt;' (length=24)
          2 => string '&lt;div&gt;C&lt;/div&gt;' (length=24)
          3 => 
            object(MyClass)[1]
              ...


    array
      0 => 
        object(MyClass)[2]
          public 'var1' => string '&lt;b&gt;value 1&lt;/b&gt;' (length=26)
          public 'var2' => string '&lt;b&gt;value 2&lt;/b&gt;' (length=26)
          public 'var3' => 
            array
              ...
function htmlrecursive($data){
    if (is_array($data) && count($data) > 1){
        foreach ($data as &$d){
            $d = htmlrecursive($d);
        }
    } else if (!is_array($data)){
        return htmlspecialchars($data);
    }
    else {
         return htmlspecialchars($data[0])
    }
}

htmlrecursive($array);

For objects you need to implement The ArrayAccess interface then you can do a array walk recursive

Also check this question Getting an object to work with array_walk_recursive in PHP

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM