Victor's Blog about PHP, Zend Framework & Cake PHP

This blog is about PHP in general. It tackles various topics related to the language itself especially at the OOP level, discusses various topics related to Zend Framework and shares my experience with Cake PHP. I believe that mastering technology is heavily based on one's ability to discuss its details and share knowledge with others. Technology is a wide wild world after all!

How-To Validate / Sanitize Data in PHP

Printable Version

victor | 07 December, 2012 08:50

In this post, I will tackle a very important (and usually ignored topic) when programming in PHP.

Data validation and sanitization is not only important at the business level, but also very important at the security level since this is where most attacks usually take place especially when dealing with SQL injection and Cross-Site Scripting attacks.

Assuming you have a piece of data that you have read from a form as follows:

$data = $_POST['data'];

Data sanitzation and validation is very co-related in here. Validation allows you to tell whether the data is valid or not without changing the value of the data itself. Sanitization, on the other hand, means removing invalid entries from the data to make it sane (as opposed to insane ;)

For this purpose, PHP has a very helpful method: filter_var

filter_var:The Function

filter_var filters a variable with a specified filter.

Its syntax is as follows:
mixed filter_var(mixed $variable[,int $filter=FILTER_DEFAULT[,mixed $options]])

  • $variable is the value to filter.
  • $filter is the ID of the filter to apply.
  • $options is an associative array of options or bitwise disjunction of flags.


The function returns the filtered data, or FALSE if the filter fails.

A simple example follows:
<?php
var_dump(filter_var('bob@example.com', FILTER_VALIDATE_EMAIL));
var_dump(filter_var('http://example.com', FILTER_VALIDATE_URL, FILTER_FLAG_PATH_REQUIRED));
?>


The first line requests to validate bob@example.com as a valid email.
Its output will be: string(15) "bob@example.com"

The second line requests to validate http://example.com as a valid URL with a path (which is not the case here).
Its output is: bool(false)

FILTERS
Possible filters are listed below:

  • FILTER_VALIDATE_BOOLEAN: returns TRUE for "1", "true", "on" and "yes". Returns FALSE otherwise.
    Possible flag: FILTER_NULL_ON_FAILURE
    If FILTER_NULL_ON_FAILURE is set, FALSE is returned only for "0", "false", "off", "no", and "", and NULL is returned for all non-boolean values.
  • FILTER_VALIDATE_EMAIL: Validates value as e-mail.
  • FILTER_VALIDATE_FLOAT: Validates value as float.
    Possible flags: FILTER_FLAG_ALLOW_OCTAL, FILTER_FLAG_ALLOW_HEX
    If flags are used, the value is filtered against the Octal range or Hexadecimal range.
  • FILTER_VALIDATE_INT: Validates value as integer.
  • FILTER_VALIDATE_IP: Validates value as IP address with flags for IPv4 or IPv6 or not from private or reserved ranges.
    Possible Flags are: FILTER_FLAG_IPV4, FILTER_FLAG_IPV6, FILTER_FLAG_NO_PRIV_RANGE, FILTER_FLAG_NO_RES_RANGE
  • FILTER_VALIDATE_REGEXP: Validates value against a regular expression.
  • FILTER_VALIDATE_URL: Validates value as URL against RFC 2396
    Possible flags are: FILTER_FLAG_PATH_REQUIRED, FILTER_FLAG_QUERY_REQUIRED


SANITIZATION
The following sanitization filters are available:

  • FILTER_SANITIZE_EMAIL: removes all characters except letters, digits and !#$%&'*+-/=?^_`{|}~@.[].
  • FILTER_SANITIZE_ENCODED: URL-encodes a string, optionally strip or encode special characters.
    Possible flags are: FILTER_FLAG_STRIP_LOW, FILTER_FLAG_STRIP_HIGH, FILTER_FLAG_ENCODE_LOW, FILTER_FLAG_ENCODE_HIGH
  • FILTER_SANITIZE_MAGIC_QUOTES: applies addslashes().
  • FILTER_SANITIZE_NUMBER_FLOAT: removes all characters except digits, +- and optionally .,eE.
    Possible flags are: FILTER_FLAG_ALLOW_FRACTION, FILTER_FLAG_ALLOW_THOUSAND, FILTER_FLAG_ALLOW_SCIENTIFIC
  • FILTER_SANITIZE_NUMBER_INT: removes all characters except digits, plus and minus sign.
  • FILTER_SANITIZE_SPECIAL_CHARS: HTML-escapes '"<>& and characters with ASCII value less than 32, optionally strips or encodes other special characters.
    Possible flags are: FILTER_FLAG_STRIP_LOW, FILTER_FLAG_STRIP_HIGH, FILTER_FLAG_ENCODE_HIGH
  • FILTER_SANITIZE_FULL_SPECIAL_CHARS: Equivalent to calling htmlspecialchars() with ENT_QUOTES set. Encoding quotes can be disabled by setting FILTER_FLAG_NO_ENCODE_QUOTES. Like htmlspecialchars(), this filter is aware of the default_charset and if a sequence of bytes is detected that makes up an invalid character in the current character set then the entire string is rejected resulting in a 0-length string.
    Possible flags are: FILTER_FLAG_NO_ENCODE_QUOTES
  • FILTER_SANITIZE_STRING: Strips tags, optionally strip or encode special characters.
    Possible flags are: FILTER_FLAG_NO_ENCODE_QUOTES, FILTER_FLAG_STRIP_LOW, FILTER_FLAG_STRIP_HIGH, FILTER_FLAG_ENCODE_LOW, FILTER_FLAG_ENCODE_HIGH, FILTER_FLAG_ENCODE_AMP
  • FILTER_SANITIZE_STRIPPED: An alias of "string" filter.
  • FILTER_SANITIZE_URL: Removes all characters except letters, digits and $-_.+!*'(),{}|\\^~[]`<>#%";/?:@&=.
  • FILTER_UNSAFE_RAW: Does nothing, optionally strips or encodes special characters.
    Possible flags are: FILTER_FLAG_STRIP_LOW, FILTER_FLAG_STRIP_HIGH, FILTER_FLAG_ENCODE_LOW, FILTER_FLAG_ENCODE_HIGH, FILTER_FLAG_ENCODE_AMP

 

Related Articles:

Comments

Add comment
 
Accessible and Valid XHTML 1.0 Strict and CSS