[PHP] String Length, The Right Way
The title may look very common and very basics thing, cause it's known that you can use strlen function to get string length, but actually it's not that simple there are many things you've to keep in mind while working with strings length, so lets get into this guide.
The Easy and common way
This is the way most of us know about, and many coders just use it all the time
$string = 'Hello';
echo strlen($string);
What this snippet of code will output ?
This will output 5 for sure, cause 'Hello' word is consist of 5 letters. Good Enough, lets take another example from German language.
$string = 'Tschüss';
echo strlen($string);
What this snippet of code will output ?
This should output 7 , cause 'Tschüss' word is consist of 7 letter, but hey this how humans count it but not in programming, actually 'Tschüss' consist of 8 letters and this snippet will output 8 not
7 . You can give it a try
Why this happen ?
actually strlen() doesn't really count letters, but count bytes, the letter 'ü' is unicode letter, and unicode letters are not 1 byte all the time but it's 1 ~ 6 bytes. and 'ü' is 2 bytes so PHP counted it as 2 letters.
Note : There is no simple answer about the 1~6 bytes letters, so we wont go deep into it now, maybe another tutorial.
The Right Way
well if you're sure the string wont has any unicode letters or you really want to count the bytes not letters then strlen() will be okay. But if you want to count letters and you're not sure if there will be any unicode letters then we gonna use another function which is <a href="http://php.net/manual/en/function.mb-strlen.php">mb_strlen</a> as follow :
$string = 'Tschüss';
echo mb_strlen($string, 'utf8');
The first parameter is the string you want to count, and the second is the encoding. Now this gonna output 7 as we expected.
Compare String length
What if we have a string and we want to check if it's less than or equal to 5 ?
The first thing to get in your mind is to do so :
$string = 'This is some string';
if(strlen($string) <= 5)
{
//$string is less than or equal to 5
}
Well this gonna work and it's very good code, but not perfect.
This is not perfect for Performance, there are a better way to do this like so by using isset instead of strlen .
$string = 'This is some string';
if(!isset($string[5]))
{
////$string is less than or equal to 5
}
This may look weird for the first look, but when you treat a string as array then the key you trying to access is the char position, so for this example $string[5] is the character which in the position 5 which 'i' (php is 0 leading so position 5 is the 6th letter) .
But why this way is better ? Because isset is language construct and strlen is function, and in general function calls is expensive than language constructs. so it do better perfomance .
Add your questions as comment if you have any, Goodluck :)
Written by Samer Moustafa
Related protips
5 Responses
Great tip!
here is a quick wrapper function for it: (Gist on Github)
<?php
function _strlen($str, $use_encoding=FALSE, $encoding='utf8'){
if($use_encoding){
return mb_strlen($str, $encoding);
}
return strlen($str);
}
// usage
$string = 'Tschüss';
echo _strlen($string, 1);
I like your function, Thanks for sharing :)
Good job on highlighting the necessity of the mb_* extensions for handling UTF-8 strings. However I disagree with your second assertion, the best way to compare string length in PHP is the obvious way. By making a micro-optimisation such as this early, you lose a communicative aspect to your code.
$str = 'hello!';
if (isset($str[5)) {
// ... $str is less than or equal to 5
}
if (mb_strlen($str) <= 5) {
// ... don't need a comment here, the intention is obvious
}
What if I want to check if a string is greater than 6? Then I am back to using mb_strlen
.
These cases should only be targeted for optimisation when actually needed, by taking this practice into common use you may lose more than you gain.
Take a look at the mbstring.func_overload
php.ini setting too: http://us1.php.net/manual/en/mbstring.overload.php
Working on a full UTF-8 'environment' should not give encoding issues, so using always mb_strlen can be the simplest and faster approach.
Nice trick with isset() anyway.