Tutorials

Beginner’s Guide to PHP Security: User Validation and Sanitization

PHP data validation sanitization

Why Secure Your Site?

In this day and age, when the internet is vast and extends sooo far, there are annoying folks who want to be malicious. And that pretty much sums up why you should secure everything. PHP security isn’t just an option anymore; it’s a necessity. Sites are hacked daily, and as you build a site using PHP, you need to know how to keep it safe from the bad guys.

What’s PHP Security?

PHP security is securing your site in PHP, to help prevent the bad guys from gaining unauthorized access to your site’s data. It helps you keep your data’s integrity and ensures availability as needed. You can start doing this in PHP with validating and sanitizing data on your site, which is what we’ll be sharing in this article.

Since this is a beginner’s post on basic validation and sanitization, you’ll want to learn more about keeping your site secure. As Master Yoda says, “Much to learn, you still have.”

Validating User Input and Some Sanitization

Validating user input is the first and one of the most important steps to securing your site. Validating means verifying the data coming into your script is the type of data you want, is in the correct format, and is the right length. Without checking these, your site is vulnerable. Depending on what your script does, it can lead to your site going down, displaying bad information, giving the bad guys access to getting information from users, and much more.

Know the Incoming Data

The first step in validating your data is knowing what data should come in. If someone is trying to hack your site, there can be extra data coming in. And if you’re accepting any data coming in, then you’re vulnerable because you’re allowing people to do whatever they want.

Imagine that you’ve got a user form that accepts adding comments on a page. You have fields for someone to add a comment that includes their name, email address, comment, and a hidden field of the page ID they’re commenting on. When the user submits a comment, a script processes the comment and adds it to a database.

Now that we have an idea of what information will be coming to our script, we need to verify that we have the correct data, type of data, a limit on the length of data, and that we aren’t using anything beyond the data we need.

Since this comment form will be sent to our script as a POST variable, we don’t want to loop through each field of the POST without knowing it’s what we want. Here’s an example of a POST variable that is sent to our script:

Array

(

[name] => Jerry

[email] =>jerryw@fake.dreamhost.com

[comment] => This is a test comment that is coming to our site

[submit] => Post Comment

[page_ID] => 37

)

This shows that we have exactly the data we asked for, but if a hacker wanted to add extra information (like an extra field), then there could be possibilities for corrupting your site. For a form like this, I recommend calling each field, so you know you’re only using what your script needs. For example, instead of looping through $_POST, you can call each field like this:

$_POST[ ‘name’ ]

$_POST[ ‘email’ ]

This will help to accept only the data you are expecting and ignore the rest.

Next, you need to know what the data is supposed to be. For example, the $_POST[ ‘page_ID’ ] is going to be an integer, because it’s just a page id that’s a number. So, we know we don’t want to accept any special characters or letters for this. We know that the $_POST[ ‘email’ ] is an email address, so we want to check the format to make sure it’s a valid email address.  In this example, we’ll say that we don’t want to allow comments over 256 characters.

Wondering how DreamHost keeps your website safe? Check out this Q&A with our Director of Technology.

Checking the Type of Data and Cleaning It Up

Now that we know what data we’re accepting, and we know what it’s allowed to be, let’s check the type of data that’s coming in.

Most data that comes in from a post is considered a string. Sometimes you might have fields like currency coming in or a page ID (like in this example), which we know is only supposed to be a number.

First, when we get data coming in, we want to check if the data we need is there. Then, we want to check if it actually has something there. Here’s a way you can check if a field actually came through.

if ( isset( $_POST[ ‘name’ ] ) )

$name = strip_tags( trim( $_POST[ ‘name’ ] ) );

Here we check if the name is there with the isset() function. This checks if the variable is there and also checks to verify the variable is not NULL. I also introduced two other functions strip_tags() and trim(). The strip_tags() function strips all HTML and PHP tags from a variable. Since we know that name is just the name of a person and does not need links, or possibly malicious code, we don’t need any tags. So if a person was to add <a href=”http://www.google.com”>Jerry</a>, it would only let the string ‘Jerry’ to be assigned to the variable. The trim() function just strips any white space from beginning and end of the string ( note: If you take a look at this function on the PHP website, you can learn about other characters that you can remove with this function.For this post, though, we’re just stripping the white space).

Next, we’ll check the type of our page ID. There are two ways this can technically be done ( there are a few other ways that I won’t review in this post). First, we can actually test if the page ID is an integer by using this:

if ( is_int( $_POST[ ‘page_ID’ ] ) )

$pageID = $_POST[ ‘page_ID’ ];

This uses the is_int() function from PHP to test if the $_POST[ ‘page_id’ ] is actually an integer. If it is, then it assigns the variable to $pageID. There are similar functions that you can use, like is_bool(), is_float(), is_numberic(), and some others.

The other way to do this is to assign the $_POST[ ‘page_ID’ ] to the variable using type cast.

Here’s an example:

$pageID = (int) $_POST[ ‘page_ID’ ];

Using (int) forces the page_ID to be an integer. So, if the value coming in is a string, instead of an integer, then it will force it to be an integer or zero (0) if not an integer. You could then test if the value equals 0, and return an error if it is.

Now, let’s take a look at the comment section. The comment section will be allowed to add tags, in case someone wants to add a link, so we don’t want to use the strip_tags() function, since this would take their <a> tag out. To accomplish this, we will use the htmlentities() function. This function converts characters to HTML entities. For instance, the character ‘<’ would be translated to ‘&lt;’. Here’s an example of how we do this for the comment section:

if ( isset( $_POST[ ‘comment’ ] ) )

$comment = htmlentities ( trim ( $_POST[ ‘comment’ ] ) , ENT_NOQUOTES );

Here, we check if the comment field came through, and if it did, then we assign it to the variable $comment using the htmlentities(). So, if any tags are included, they will be converted. Let’s say someone adds the link:

<a href=”https://www.dreamhost.com/”>Awesome Hosting</a>

After if goes through the htmlentities() function above, it will be this:

&lt;a href=”https://www.dreamhost.com/”&gt;Awesome Hosting&lt;/a&gt;

This is using the ENT_NOQUOTES option. If you take a look at this function on the PHP website, there are other options, depending on what you’d like to do.

Checking the Length of Variables

This might not seem critical, but checking the length of variables is quite important. Without checking variables, a user could cause buffer overflow issues. Not only that, but if you have a table in your database with the name as a comment, and it can only have 256 characters. If a user types 356 characters, then part of their post will be cut off. If you check length, you can let the user know that they need to shorten their comment.

To check the length of a string, use the function strlen(). This function returns the length of a string for you. Here’s an example:

if ( strlen( $_POST[ ‘comment’ ] ) <= 256 )

$comment = htmlentities ( trim ( $_POST[ ‘comment’ ] ) , ENT_NOQUOTES );

Here, we check if the length of $_POST[ ‘comment’ ] is shorter than or equal to 256. If it is, then we assign it to the variable. Another option to check that the string length is enough to be a comment is something like this:

if ( strlen( $_POST[ ‘comment’] ) >= 1 && strlen( $_POST[ ‘comment’ ] ) <= 256 )

$comment = htmlentities ( trim ( $_POST[ ‘comment’ ] ) , ENT_NOQUOTES );

This checks to make sure the length of the comment is more than one character.

Is the Format Correct From the User?

Making sure the format is correct is important to verify that the information can be used correctly later, but also for error control on your site. (Unclear) In this post, we’ll use the PHP function preg_match() with regular expressions to accomplish this.

Before I get into actual commands that we’ll use, I wanted to mention that I won’t go into explaining regular expressions in this post. If you’d like to learn about it, there are tutorials online. For this post, we’ll be grabbing our regular expressions from http://regexlib.com, which has quite a few regular expressions that you can use.

The preg_match() function searches a variable for a regular expression pattern to see If it matches. For example, let’s check if our email address is a valid email address:

if ( preg_match( ‘/^[\w-\.]+@([\w-]+\.)+[\w-]{2,4}$/’, $_POST[ ’email’ ] ) )

$emailAddress = trim( $_POST[ ’email’ ] );

This takes a regular expression, which in this case is ‘^[\w-\.]+@([\w-]+\.)+[\w-]{2,4}$’, and checks the $_POST[ ‘email’ ] to verify that it matches that pattern. According to the regexlib.com site, it will match the following formats:

joe@aol.com | joe@wrox.co.uk | joe@domain.info

If the user does not use one of the above formats, then it will return false, and the $emailAddress variable will not get assigned the email address. The preg_match() makes it possible to check the format of any variable, as long as the regular expression is correct. To use it, you can do like above:

if ( preg_match( ‘/<ENTER EXPRESSION HERE>/’, <INSERT VARIABLE HERE> ) )

When using it this way, make sure to add the forward slashes to the front and back of the regular expression as shown above.

Regular expressions are quite powerful and can be used to test a lot of different patterns. You can test everything from phone numbers to the character types.  Take a look at http://regexlib.com for more patterns that you can use.

Sanitizing Your Data: A Little More Information

Sanitizing data is another essential element of PHP security. In our last section, <LINK>Validating User Input and Some Sanitization<LINK to Post 1>, we did some sanitization as part of cleaning. We validated our data by checking if it matched the data that we wanted. Here are two more tips to help protect your site from the bad guys.

MySQL Injection

What’s MySQL Injection? Basically, it’s when bad guys try to manipulate your site to add SQL into your SQL command to get more information, modify, or delete data in your database. Here’s an example of a simple SQL injection:

$userID = $_POST[ ‘user_id’ ]; //  This is a value of “‘ OR 1′”;

$query = “SELECT * FROM users WHERE user_id = ‘$userID’”;

//output: SELECT * FROM users WHERE user_id = ” OR 1”

This example shows a script that has not been secured, so the creator of the script input the $_POST[ ‘user_id’ ] right into the SQL for the site. Some bad guy came along and decided to change the value in the hidden form from a number to ‘OR 1’. Now, if this was used to query for one user, it would actually pull all the users from the table because when you change the script to WHERE user_id = ‘’ OR 1, it will pull all the rows from the table.

Wow, so how do we stop this trickery?Luckily, this is a beginner’s guide, so we have the perfect beginner’s method for you!

PHP has a function called mysql_real_escape_string() that helps prevent injection. Before you use this function, you should still validate all the data and sanitize it, to make sure it’s clean. Let’s say we validated all the data for our comment form, and now we want to add it to the database. But let’s also say I’m a bad guy and try to inject some secret stuff into your site maliciously. So, I actually put the page_ID as ‘ OR 1’ as we talked about earlier and you forgot to sanitize the page_ID. T(I know you wouldn’t really forget to do that J)

Since we used our mysql_real_escape_string() function, we prevented the injection. Here’s an example:

$pageID = mysql_real_escape_string( $_POST[ ‘page_ID’ ] ); //  This is a value of “‘ OR 1′”;

$query = “SELECT * FROM pages WHERE page_id = ‘$pageID’”;

//output: SELECT * FROM pages WHERE page_id = ‘\’ OR 1 \”

As you can see from the output of this, the ‘ Or 1 ‘ actually became \’ OR 1 \’, which prevented the modification of the WHERE, which stopped extra data from coming out. Again, this is a first step to stopping injection and I suggest reading more about preventing this. PHP has other methods of accessing a database.

DreamPress Plus and Advanced plans now include Jetpack Premium and its amazing WordPress security perks!

Just a Bit of Cross-Site Injection

Since we went over so much already, I wanted to write an extra bit on Cross-Site injection. It’s when the bad guys inject data into your site, which will later be sent to the client-side, to maliciously get data from users, modify your site in a way to change data, or delete data. Cross-Site injection is a huge security vulnerability. So, how can you help to prevent this from happening?

Well, first you can use that trusty htmlentities() function that we used earlier. Using this ensures that any data that you echo out will be safer so pesky hackers won’t be able to inject into your site. For example, let’s say that a user visits your site, comments on your page, and adds the following as their code:

<iframe src=”http://bad-dude-hacker-mafia.com/xss-injection.php” height=0 width=0 />

If we did nothing to protect our site, and this was displayed on the page every time someone viewed it, they could collect data, show information on your site, and so forth. But, if we use our htmlentities() function, we can prevent this:

echo htmlentities ( trim ( $comment ) , ENT_NOQUOTES );

//Output: &lt;iframe src=”http://bad-dude-hacker-mafia.com/xss-injection.php” height=0 width=0 /&gt;

As you can see by the output, this might display as text, but it won’t actually open the bad-dude-hacker-mafia.com site and no havoc has been caused.

Yoda for the Road

Okay! Now you’ve learned how to protect your PHP site using validation, sanitization, MySQL injection prevention, and some Cross-Site injection skills. Remember, this is only the beginning. There’s lots of information online to help protect your site, and the more you know, the safer you are.

Pass on what you have learned, Luke. There is… another… Sky… walker.

Yeah, Yoda’s pretty wise.

Having issues with another web host? Switch to DreamHost’s award-winning hosting!

About the author

DreamHost

Leaders in web hosting, domain registration, and cloud services for individuals, small businesses, and developers!

3 Comments

  • Bad practices and errors. Delete those examples or fix them, because some beginner might just copy/paste your example code and spend hours trying to figure out why they don’t work or worse – seem to work but actually don’t.

    Start with is_int
    It actually states at the beginning of manual:
    “To test if a variable is a number or a numeric string (such as form input, which is always a string), you must use is_numeric().”
    http://php.net/manual/en/function.is-int.php

    Cheers..

  • An alternative to using regular expressions to check for email address validity is to use the function filter_var:

    filter_var(“someone@example.com”, FILTER_VALIDATE_EMAIL)

    The following manual entry contains a list of valid filters that work with the above method
    http://us3.php.net/manual/en/filter.filters.validate.php

    Whilst filter_var internally uses a regular expression match of it’s own, it’s does unfortunately execute slower than using the preg_match function. My own tests show it runs about 10 times slower.

    So if you’re unfamiliar with regular expressions, then it might be a good idea to use filter_var if you are stuck. Otherwise, preg_match is a faster solution.