Flippin’ Fast Array Searches in PHP

F
U. S. Air Force photo by Sue Sapp

Recently, while optimizing some code to use less memory, I had to manually compare two arrays rather than using PHP’s built-in functions.  Given the sizes of the arrays involved processing was far too slow to be practical. While considering my options (and groaning at the amount of work they would take) it occurred to me that a simple trick might just do the job.

Since PHP actually implements arrays as hash tables I suspected that checking the existence of a key would be much faster than scanning every value in a large array.

Using PHP’s array_flip function I converted one array’s values into keys and searched the resulting array with array_key_exists rather than the original array’s values with in_array. This improved the search time by more than three orders of magnitude – the equivalent of reducing a 15-minute search to less than a second!

Later I wrote a quick test program to do some performance profiling and found much the same result.  In the code below I compared an array of 100,000 randomly generated five-character strings against a dictionary array of about 100,000 entries.  Iterating over the character string array with in_array would on average require 100,000 * 50,000 = 5 billion string comparisons, which over several runs averaged about 26 seconds.  Conversely, the method described above required much closer to a linear number of operations and averaged about 1/125 of a second.

<?php

//Generate 100,000 random 5-character strings
$randCharString = '';
for($i = 0; $i < 100000; $i++){
    foreach(range(0,4) as $j){
        $randCharString{$j} = chr(rand(97,122));
    }
    $jumbles[] = $randCharString;   
}

//Get dictionary arrays
$dictionary = file('/usr/share/dict/words', FILE_IGNORE_NEW_LINES);
$dictionaryFlipped = array_flip($dictionary);


//Search standard (non-flipped) array
$stdMatches = [];
$time = microtime(true);

foreach($jumbles as $jumble){
    if(in_array($jumble, $dictionary)){
        $stdMatches[] = $jumble;
    }   
}

$deltaStdArray = microtime(true) - $time;

printf('Found %d matches for non-flipped array in %.4f seconds<br>', count($stdMatches), $deltaStdArray);
echo '--------------------------------------------------------------------------------------------<br>';
echo implode($stdMatches, '<br>') . '<br><br>';


//Search flipped array
$flippedMatches = [];
$time = microtime(true);

foreach($jumbles as $jumble){
    if(array_key_exists($jumble, $dictionaryFlipped)){
        $flippedMatches[] = $jumble;
    }   
}

$deltaFlippedArray = microtime(true) - $time;

printf('Found %d matches for flipped array in %.4f seconds<br>', count($flippedMatches), $deltaFlippedArray);
echo '--------------------------------------------------------------------------------------------<br>';
echo implode($flippedMatches, '<br>');

Of course, if your data sets are small enough that memory is not an issue, you are better off using PHP’s built in array comparison functions which are extremely fast.  In my case I was comparing arrays of about 1-million elements each and wanted to keep within PHP’s default memory limit of 128M.

About the author

Chris Peterson

As a Web Application Developer & Elephant Trainer I have been putting the PHP mascot to work for more than a decade. I specialize in back-end development and use the LAMP stack to craft software that frees human beings to spend their time on more productive and rewarding things.

By Chris Peterson

Chris Peterson

As a Web Application Developer & Elephant Trainer I have been putting the PHP mascot to work for more than a decade. I specialize in back-end development and use the LAMP stack to craft software that frees human beings to spend their time on more productive and rewarding things.

Recent Posts

Recent Comments

Archives