Regex Validation In Php
Validation is a common concern for webmasters everywhere. If someone is entering information into your form, you need to validate it! But what is the simplest way to do so? Well, whenever dealing with large amounts of text patterns that need to be followed, I always turn to Regular Expressions. Regular Expressions are a way of following a pattern you define.
While I won't teach you Regex, I will tell you about Regex in PHP. PHP predominantly uses the flavor of Regex that Perl does. This is why the function to perform a regular expression in PHP is preg_match. Perform a PERL Regular Expression Match, get it? Anyway, there are certain modifiers that can be added to the pattern in PHP to affect how it performs.
For example, adding "i" makes it case insensitive. I had to do this for the Email address pattern, because it shortens the regex pattern for readability in the assumption that you use it with case insensitivity. Then, there is the "m" modifier which will treat a multiple line subject as actually multiple lined. Otherwise, newlines (n) are ignored and the anchors (^ and $) will assume you mean the beginning and end of the entire subject. "s" is useful if you are constantly using a regex pattern that does not start with ^ and does not have a consistent starting character.
In order to determine if the username, password, and verification password are correct, I assume that only alphanumeric characters are allowed. Therefore, instead of searching for everything that could be a potential wrong character, I search for characters that are not alphanumeric. This is indicated by the ^ character in the character class [^A-Za-z0-9]. Then, later on, if there is a match, I know that there is an invalid character in the field.
Also keep in mind that there are metacharacters that need escaped in regex. I recommend preg_quote for this.
The email and URL values are more difficult, especially because these are standards that are hard to define. For example, on regular-expressions.info, quite a lengthy debate has gone into the email regex. Technically, a valid email address can contain apostrophes. For the majority of email addresses, what is listed will be valid. URLs contain a similar problem. Top-level domains are numerous, and a list is hard to aggregate. People also have a tendency to just say "abc.com" instead of "http://abc.com". There are also situations where the top-level domain contains a value for the country as well, like "google.co.uk".
I can not advise attempting to include as much as you can. If you try to cater to everyone, you end up spending so much time on validation that it is no longer feasible. Validate user input according to what is expedient in your code, not according to what they will enter.
Also, regex is not the solution to every problem (though it is for most). If you are validating something that has many exceptions or is extremely complicated, it becomes a chore. I can't even imagine trying to format dates and times. Is 12:00 noon or midnight? Is 12:60 valid? What about 17:00? Do you require AM and PM to be entered? Does 3 mean the same as 3:00? These things are better left to either humans or some other method you can devise.
<?php
$un = $POST["user"];
$pw = $POST["pass"];
$pw2 = $POST["pass2"];
$em = $POST["email"];
$url = $_POST["url"];
$info = array($un, $pw, $pw2, $em, $url);
foreach ($info as $idx => $unit) {
switch ($idx) {
case 4:
$rgx = "/^((ftp|http|https)://(w+:{0,1}w*@)?)?(S+)(:[0-9]+)?(/|/([w#!:.?+=&%@!-/]))?$/";
break;
case 3:
$rgx = "/^[A-Z0-9._%+-]+@[A-Z0-9.-]+.[A-Z]{2,4}$/i";
break;
case 2:
//read below
$rgx = "/[^A-Za-z0-9]+/";
break;
case 1:
//read below
$rgx = "/[^A-Za-z0-9]+/";
break;
case 0:
//if characters are NOT normal
$rgx = "/[^A-Za-z0-9]+/";
break;
default:
echo "???";
die("$unit ?");
break;
}
$n = preg_match($rgx, $unit, $matches);
if ( ($idx == 0) || ($idx == 1) || ($idx == 2) ){
if ($n) {
echo "Bad Characters in $unit; Alphanumeric only";
}
} else {
if ($n == 0) {
echo "Incorrect Format in $unit; Enter Valid Info";
}
}
}
?>