Nov
03

PHP Security, Validate User Input Part I

AddThis Social Bookmark Button

Your users’ data is useless if it isn’t used. And yet, paradoxically, that data is endangered by the very act of accessing it. Particularly dangerous are the accesses occasioned by users’ queries, submitted typically via form input. Legitimate users may accidentally make requests that turn out to be dangerous; illegitimate users will carefully craft requests that they know are dangerous, hoping that they can slip them past you. In this article, we introduce the concept of input validation, beginning with a discussion of why it is so important to the overall security of your applications. PHP’s relaxed attitude toward variables (allowing them to be used without having been declared, and converting types automatically) is ironically an open door to possible trouble. If you are to fulfill your ultimate goal of safeguarding your users’ data, then, you will have to pay special attention to validating the data that users submit to your scripts. The process of validating that data is the topic of this article. We will build a PHP class that acts as an abstraction layer for user input, and expand it in a modular way so that it can safely validate values as belonging to specific data types and formats. Finally, we discuss strategies for finding input validation vulnerabilities in your applications. There is no one class of attack that form validation prevents. Rather, proper checking and limiting of user input will cut off avenues that could have been used for many of the kinds of attacks we will be discussing in Part 3 of this book, including SQL injection, file discovery, remote execution, and still other attacks that don’t even have names yet. Form validation generally attempts to prevent exploits by stopping abusive or resource-intensive operations before they ever start.

Input Containing Metacharacters 
Even the most ordinary alphanumeric input could potentially be dangerous if it were to contain one of the many characters known as metacharacters, characters that have special meaningwhen processed by the various parts of your system. These characters are easy for an attackerto send as a value because they can simply be typed on the keyboard, and are fairly high-frequencycharacters in normal text. One set of metacharacters includes those that trigger various commands and functions built into the shell. Here are a few examples:
$ ^ & * ( ) ~ [ ] | { } ‘ " ; < > ? - `
These characters could, if used unquoted in a string passed as a shell argument by PHP, result in an action you, the developer, most likely did not intend to have happen. Another set of metacharacters includes those that have special meaning in database queries: 
‘ " ; \
Depending on how the query is structured and executed, these characters could be used to inject additional SQL statements into the query, and possibly execute additional, arbitrary queries. There is another group of characters that are not easy to type, and not so obviously dangerous, but that could represent a threat to your system and databases. These are the first 32 characters in the ASCII (or Unicode) standard character set, sometimes known as control characters because they were originally used to control certain aspects of the display and printing of text. Although any of these characters might easily appear in a field containing binary values (like a blob), most of them have no business in a typical string. There are, however, a few that might find their way into even a legitimate string:

  • The character \x00, otherwise known as ASCII 0, NULL or FALSE.
  • The characters \x10 and \x13, otherwise known as ASCII 10 and 13, or the \n and \r line-end characters.
  • The character \x1a, otherwise known as ASCII 26, which serves as an end-of-file marker.

Any one of these characters or codes, appearing unexpectedly in a user’s text input, could at best confuse or corrupt the input, and at worst permit the injection of some attacking command or script.
Finally, there is the large group of multibyte Unicode characters above \xff that represent non-Latin characters and punctuation. Behind the scenes, characters are all just 1 byte long, which means there are only 256 possible values that a character can have. Unicode defines special 2- and 4-byte sequences that map to most human alphabets and a large number of symbols. These multibyte characters are meaningless if broken into single bytes, and possibly dangerous if fed into programs that expect ASCII text. PHP itself handles multibyte characters safely (see http://php.net/mbstring for information), but other programs, databases, and file systems might not.

Wrong Type of Input
Input values that are of an incorrect data type or invalid format are highly likely to have unintended, and therefore undesirable, effects in your applications. At best, they will throw errors that could leak information about the underlying system. At worst, they may provide avenues of attack.
Here are some simple examples:

  • If you expect a date, which you are going to use to build a unix timestamp, and some other type of value is sent instead, he generated timestamp will be for 31 December 1969, which is second -1 on unix systems.
  •  Image processing applications are likely to choke if they are provided with nonimage input.
  •  Filesystem operations will fail with unpredictable results if they are given binary data (or, depending on your operating system, most standard punctuation marks) as part of a filename.

Too Much Input
Input values that are too large may tie up your application, run afoul of resource limits, or cause buffer overflow onditions in underlying libraries or executed applications. Here are examples of some possibilities:

  • If you intend to spellcheck the input from an HTML text area on a comment form, and you don’t limit the amount of text that can be sent to the spellchecker, an attacker could send as much as 8MB of text (PHP’s default memory_limit, set in php.ini) per submission. At best, this could slow your system down; at worst, it could crash your application or even your server.
  • Some database fields are limited to 255 or fewer characters. Any user input that is longer may be silently truncated, thus losing a portion of what the user has expected to be stored there.
  • Filenames have length limits. Filesystem utilities that receive too much input may either continue after silently truncating the desired name (with probably disastrous results), or crash.
  • Buffer overflow is of course the primary danger with too-long input, though thankfully not within PHP itself. A buffer overflow occurs when a user enters a quantity of data larger than the amount of memory allocated by an application to receive it. The end of the data overflows into the memory following the end of the buffer, with the following possible results:
    • An existing variable might be overwritten.
    • A harmless application error might be generated, or the application may crash.
    • An instruction might be overwritten with an instruction that executes uploaded code.

 Abuse of Hidden Interfaces
A hidden interface is some layer of your application, such as an administrative interface, which an attacker could access by handcrafting a form or request. For an extremely basic example of how such a hidden interface might be exploited, consider the following fragment of a script:

  1. <form id="editObject">
  2. name: <input type="text" name="name" /><br />
  3. <?php
  4. if ( $username == ‘admin’ ) {
  5. print ‘delete: <input type="checkbox" name="delete" value="Y" /><br />’;
  6. }
  7. ?>
  8. <input type="submit" value="Submit" />
  9. </form>

A user who is not an administrator uses a version of the form that has only a name input. But an administrator’s version of the form contains an extra input field named delete, which will cause the object to be deleted. The script that handles the form does not expect any value for the delete variable to be coming in from a regular user. But an attacker might very well be able to construct her own editObject form and try to use it to delete objects from the system.
A more common example of a hidden interface might occur in an application that uses a value like $_GET[’template’] to trigger the inclusion of a PHP script. An attacker might try entering a URI like http://example.org/view.php?template=test or ?template=debug just to see whether the developers happen to have left a debugging template around.

 Input Bearing Unexpected Commands
The effects of an unexpected command suddenly appearing in a stream of input are highly application-specific. Some commands may simply create harmless PHP errors. It is not difficult, however, to imagine scenarios where carefully crafted user input could bypass authentication routines or initiate downstream applications.
The ways in which commands can be inserted into input include the following:

  • Attackers may inject commands into SQL queries.
  • Any script that sends email is a potential target for spammers, who will probe for ways to use your script to send their own messages.
  • Network socket connections often use escape sequences to change settings or terminate the connection. An attacker might insert escape sequences into values passed over such a connection, which could have highly destructive consequences.
  • Cross-site and remote shell scripting are potentially the most serious kinds of command injection vulnerabilities.
 
AddThis Social Bookmark Button

No Comments

No comments yet.

Comments RSS Feed   TrackBack URL

Sorry, the comment form is closed at this time.