Part 1: PHP Security: User Validation and Sanitization for Beginners

1First and foremost: you’ll need a working knowledge of PHP to read this post. But not much is needed since this is intended for beginners.

Why Secure your Site

In this day and age, when the internet is vast and extends sooo far,there are annoying folks who want to be malicious. And that pretty much sums up why you should secure everything. PHP security isn’t just an option anymore; it’s a necessity. Sites are hacked daily, and as you build a site using PHP, you need to know how to keep it safe from the bad guys.

What’s PHP security?

PHP security is securing your site in PHP, to help prevent the bad guys from gaining unauthorized access to your site’s data. It helps you keep your data’s integrity and ensures availability as needed. You can start doing this in PHP with validating and sanitizing data on your site, which is what I’ll be sharing in this article.

Since this is a beginner’s post on basic validation and sanitization, you’ll want to extend farther out from this post to learn more about keeping your site secure. As Master Yoda says, “Much to learn, you still have.”

security

Validating User Input and Some Sanitization

Validating user input is the first (and one of the most important steps) to securing your site. Validating means verifying the data coming into your script is type of data you want, is in the correct format, and is the right length. Without checking these, your site is vulnerable. Depending on what your script does, it can lead to your site going down, displaying bad information, giving the bad guys access to getting information from users, and much more.

Know the incoming data

The first step in validating your data is knowing what data should come in. If someone is trying to hack your site, there can be extra data coming in. And if you’re accepting any data coming in, then you’re vulnerable because you’re allowing people to do whatever they want.

Imagine that you’ve got a user form that accepts adding comments on a page. You have fields for someone to add a comment that includes their name, email address, comment, and a hidden field of the page ID they’re commenting on. When the user submits a comment, a script processes the comment, and adds it to a database.

Now that we have an idea of what information will be coming to our script, we need to verify that we have the correct data, type of data, a limit on the length of data, and that we aren’t using anything beyond the data we need.

Since this comment form will be sent to our script as a POST variable, we don’t want to loop through each field of the POST without knowing it’s what we want. Here’s an example of a POST variable that is sent to our script:

Array

(

[name] => Jerry

[email] =>jerryw@fake.dreamhost.com

[comment] => This is a test comment that is coming to our site

[submit] => Post Comment

[page_ID] => 37

)

This shows that we have exactly the data we asked for, but if a hacker wanted to add extra information (like an extra field), then there could be possibilities for corrupting your site. For a form like this, I recommend calling each field, so you know you’re only using what your script needs. For example, instead of looping through $_POST, you can call each field like this:

$_POST[ ‘name’ ]

$_POST[ ‘email’ ]

This will help to accept only the data you are expecting and ignore the rest.

Next, you need to know what the data is supposed to be. For example, the $_POST[ ‘page_ID’ ] is going to be an integer, because it’s just a page id that’s a number. So, we know we don’t want to accept any special characters or letters for this. We know that the $_POST[ ‘email’ ] is an email address, so we want to check the format to make sure it’s a valid email address.  In this example, we’ll say that we don’t want to allow comments over 256 characters.

Checking the type of data and cleaning it up

Now that we know what data we’re accepting, and we know what it’s allowed to be, let’s check the type of data that’s coming in.

Most data that comes in from a post is considered a string. Sometimes you might have fields like currency coming in, or a page id (like in this example)which we know is only supposed to be a number.

First, when we get data coming in, we want to check if the data we need is there. Then, we want to check if it actually has something there. Here’s a way you can check if a field actually came through.

if ( isset( $_POST[ 'name' ] ) )

$name = strip_tags( trim( $_POST[ 'name' ] ) );

Here we check if the name is there with the isset() function. This checks if the variable is there and also checks to verify the variable is not NULL. I also introduced two other functions strip_tags() and trim(). The strip_tags() function strips all HTML and PHP tags from a variable. Since we know that name is just the name of a person, and does not need links, or possibly malicious code, we don’t need any tags. So if a person was to add <a href=”http://www.google.com”>Jerry</a>, it would only let the string ‘Jerry’ to be assigned to the variable. The trim() function just strips any white space from beginning and end of the string ( note: If you take a look at this function on the PHP website, you can learn about other characters that you can remove with this function.For this post, though, we’re just stripping the white space).

Next we’ll check the type of our page ID. There are two ways this can technically be done ( there are a few other ways that I won’t review in this post). First, we can actually test if the page ID is a integer by using this:

if ( is_int( $_POST[ 'page_ID' ] ) )

$pageID = $_POST[ 'page_ID' ];

This uses the is_int() function from PHP to test if the $_POST[ ‘page_id’ ] is actually an integer. If it is, then it assigns the variable to $pageID. There are similar functions that you can use, like is_bool(), is_float(), is_numberic(), and some others. There’s plenty of information on the PHP website – link at the bottom of this post..

The other way to do this is to assign the $_POST[ ‘page_ID’ ] to the variable using type cast.

Here’s an example:

$pageID = (int) $_POST[ 'page_ID' ];

Using (int) forces the page_ID to be an integer. So, if the value coming in is a string, instead of an integer, then it will force it to be an integer or zero (0) if not an integer. You could then test if the value equals 0, and return an error if it is.

Now, let’s take a look at the comment section. The comment section will be allowed to add tags, in case someone wants to add a link, so we don’t want to use the strip_tags() function, since this would take their <a> tag out. To accomplish this, we willuse the htmlentities() function. This function converts characters to HTML entities. For instance, the character ‘<’ would be translated to ‘&lt;’. Here’s an example of how we do this for the comment section:

if ( isset( $_POST[ 'comment' ] ) )

$comment = htmlentities ( trim ( $_POST[ 'comment' ] ) , ENT_NOQUOTES );

Here, we check if the comment field came through, and if it did, then we assign it to the variable $comment using the htmlentities(). So, if any tags are included, they will be converted. Let’s say someone adds the link:

<a href=”http://dreamhost.com/”>Awesome Hosting</a>

After if goes through the htmlentities() function above, it will be this:

&lt;a href=”http://dreamhost.com/”&gt;Awesome Hosting&lt;/a&gt;

This is using the ENT_NOQUOTES option. If you take a look at this function on the PHP website, there are other options, depending on what you’d like to do.

Checking the length of variables

This might not seem critical, but checking the length of variables is quite important. Without checking variables, a user could cause buffer overflow issues. Not only that, but if you have a table in your database with name as a comment, and it can only have 256 characters. If a user types 356 characters, then part of their post will be cut off. If you check length, you can let the user know that they need to shorten their comment.

To check the length of a string, use the function strlen(). This function returns the length of a string for you. Here’s an example:

if ( strlen( $_POST[ 'comment' ] ) <= 256 )

$comment = htmlentities ( trim ( $_POST[ 'comment' ] ) , ENT_NOQUOTES );

Here, we check if the length of $_POST[ ‘comment’ ] is shorter than or equal to 256. If it is, then we assign it to the variable. Another option to check that the string length is enough to be a comment is something like this:

if ( strlen( $_POST[ 'comment'] ) >= 1 && strlen( $_POST[ 'comment' ] ) <= 256 )

$comment = htmlentities ( trim ( $_POST[ 'comment' ] ) , ENT_NOQUOTES );

This checks to make sure the length of the comment is more than one character

3
Is the format correct from the user?

Making sure the format is correct is important to verify that the information can be used correctly later, but also for error control on your site. (Unclear) In this post, we’ll use the PHP function preg_match() with regular expressions to accomplish this.

Before I get into actual commands that we’ll use, I wanted to mention that I won’t go into explaining regular expressions in this post. If you’d like to learn about it, there are tutorials online.. For this post, we’ll be grabbing our regular expressions from http://regexlib.com, which has quite a few regular expressions that you can use.

The preg_match() function searches a variable for a regular expression pattern to see If it matches. For example, let’s check if our email address is a valid email address:

if ( preg_match( ‘/^[\w-\.]+@([\w-]+\.)+[\w-]{2,4}$/’, $_POST[ 'email' ] ) )

$emailAddress = trim( $_POST[ 'email' ] );

This takes a regular expression, which in this case is ‘^[\w-\.]+@([\w-]+\.)+[\w-]{2,4}$’, and checks the $_POST[ ‘email’ ] to verify that it matches that pattern. According to the regexlib.com site, it will match the following formats:

joe@aol.com | joe@wrox.co.uk | joe@domain.info

If the user does not use one of the above formats, then it will return false, and the $emailAddress variable will not get assigned the email address. The preg_match() makes it possible to check the format of any variable, as long as the regular expression is correct. To use it, you can do like above:

if ( preg_match( ‘/<ENTER EXPRESSION HERE>/’, <INSERT VARIABLE HERE> ) )

When using it this way, make sure to add the forward slashes to the front and back of the regular expression as shown above.

Regular expressions are quite powerful and can be used to test a lot of different patterns. You can test everything from phone numbers to the character types.  Take a look at http://regexlib.com for more patterns that you can use.
So far, I have gone over some beginning methods for securing your site so stay tuned for Part 2 coming up next week!