简体   繁体   中英

When (and why the when) and how should I sanitize data from POST JSON in php (such that output usable in Swift AND HTML)

The past couple of days, I've read through a lot of resources on the sanitization of input and output data with PHP to prevent (most prominently) XSS and SQL injection, ia a bunch of question on SO. At this point, however, I feel like I am more confused and insecure about what I am supposed to do and what I am not supposed to do due in part to some contrary information, eg I've read many times that I don't need to use mysqli_real_escape_string or any other forms of sanitization of input whatsoever if I use prepared statements, other sources say I should just use it anyway or even that I should sanitize it like so ; this page by Apple rather roughly(?) goes over the topic; etc. Therefore, I would really appreciate some clarification on what I am supposed to do - preferably but not necessarily, by someone who has got some experience in the field (server-side security) due to eg working in this field, having done a lot of research in it or maybe even being on the attacker's side(?).

To understand my situation better, I am going to go over it as concisely as possible:

I am currently programming an app using Swift (iOS) and need to send some data to my server where it is saved in a table using SQL and can be retrieved from by other users (eg for a blog).

To do this I send the data via POST, encoded as JSON, to my server (“myphp.php”; with Alamofire, which shouldn't be very important, though) and decode it there. And this is the first spot where I am not sure if I should already sanitize my data in some way (with reference to the question I linked above). Anyway, then I go on to eg insert it in a table using prepared statements (MySQL, so nothing's emulated). Moreover, I would also like the data I output to be usable in html or rather the entire PHP be usable for AJAX, too.

Here is an example of what I mean:

// SWIFT
// set parameters for request
let parameters: Parameters = [
    “key”: “value”,
    ...
]

// request with json encoded parameters
Alamofire.request(“myphp.php”, method: .post, parameters: parameters, encoding: JSONEncoding.default)
.validate().responseJSON(completionHandler: { (response) in
// do things with data (e.g. show blog post)

// PHP
header('Content-Type: application/json');

$decodedPost = json_decode(file_get_contents('php://input'), true);

// what to do with input...?

// PREPARED STATEMENTS: insert, select, etc.

// what to do with output...?

// echo response - json-encoded so that
// json completion handler in swift can work with it 
echo json_encode($output, JSON_NUMERIC_CHECK);

I've asked a friend for some advice on this and he told me he always does the following ( xss_clean() is a function he sent me, too) - whether the data is in- or outputted:

$key = xss_clean(mysqli_real_escape_string($db, trim(htmlspecialchars($data)))); 
// e.g. $data = decodedPost["key"]

However, not only my research tells me that this probably isn't necessary, but he also told me this has its limitations, most obviously when data is supposed to be retrieved again from the server and displayed again to eg another user - as close to the original input as possible.

As you can see, I am really confused. I want to protect the data of users, which is sent to the server, as well as I can so this is a very important topic for me. I hope this question isn't too broad but many other questions were, like I said, either, at least partly, contradictory or very old and eg still using simple mysql extensions and no prepared statements. If you need more information, feel free to ask. References to official documents (to support answers) are very much appreciated. Thank you!

Input sanitization is a misleading term that indicates that you can wave a magic wand at all data and make it "safe data". The problem is that the definition of "safe" changes when the data is interpreted by different pieces of software as do the encoding requirements. Similarly the concept of "valid" data varies depending on context - your data may very well require special characters (',",&,<) - note that SO allows all of these as data.

Output that may be safe to be embedded in an SQL query may not be safe for embedding in HTML. Or Swift. Or JSON. Or shell commands. Or CSV. And stripping (or outright rejecting) values so that they are safe for embedding in all those contexts (and many others) is too restrictive.

So what should we do? Make sure the data is never in a position to do harm. The best way to achieve this is to avoid interpretation of the data in the first place. Parameterized SQL queries is an excellent example of this; the parameters are never interpreted as SQL, they're simply put in the database as, well, data.

That same data may be used for other other formats, such as HTML. In that case, the data should be encoded / escaped for that particular language at the moment it's embedded. So, to prevent XSS, data should be HTML-escaped (or javascript or URL escaped) at the time it's being put into the ouput. Not at input time. The same applies to other embedding situations.

So, should we just pass anything we get straight to the database?

No - there are definitely things you can check about user input, but this is highly context-dependent. Let's call this what it is - validation. Make sure this is done on the server. Some examples:

  • If a field is supposed to be an integer, you can certainly validate this field to ensure it contains an integer (or maybe NULL).
  • You can often check that a particular value is one of a set of known values (white list validation)
  • You can require most fields to have a minimum and maximum length.
  • You should usually verify that any string contains only valid characters for its encoding (eg, no invalid UTF-8 sequences)

As you can see, these checks are very context-dependent. And all of them are to help increase the odds you end up with data that makes sense. They should not be the only defense to protect your application from malicious input (SQL injection, XSS, command injection, etc), because this is not the place to do that.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM