← Front page
Search

What is the fastest way to serialize and unserialize values in PHP?

In PHP one can serialize values in many ways. There are at least serialize, json_encode and var_export. Unserialization can be done with unserialize, json_decode and rather ugly eval.

While doing search functionality for this site I started thinking which is the fastest.

So I wrote this script to test it:

<?php
date_default_timezone_set('Europe/Helsinki');

ini_set('display_errors', 1);
error_reporting(E_ALL);

echo "synthetic:\n";
$data = generate_synthetic_data(0, 5);
$data = test_performance($data, 10);
show_results($data);

echo "realistic:\n";
$data = generate_realistic_data();
$data = test_performance($data, 10);
show_results($data);

// just prints out averages for each serialization function
function show_results($data) {
  echo '<pre>';
  foreach ($data as $function => $results) {
    $total = 0;
    $average = 0;
    echo $function."\n";
    foreach ($results as $iteration => $time) {
      //echo "\t".$time."\n";
      $total += $time;
    }
    $average = $total/count($results);
    echo "avg:\t".$average."\n\n";
  }
  echo '</pre>';
}

// helper function to generate random string with given lenght
function random_string($length) {
  $keys = array_merge(range(0,9), range('a', 'z'));
  $key = '';
  for($i=0; $i < $length; $i++) {
    $key .= $keys[array_rand($keys)];
  }
  return $key;
}

// generates 1000 blog post like arrays
function generate_realistic_data() {
  $data = array();
  $post = array();

  for ($i = 0; $i < 1000; $i++) {
    $date = mt_rand(1262304000, 1325376000);
    $title_len = mt_rand(1, 100);
    $content_len = mt_rand(100, 2000);
    $content_html_len = mt_rand($content_len-50, $content_len+500);
    $author_len = mt_rand(5, 20);

    $post['date'] = date("Y-m-d H:i:s", $date);
    $post['title'] = random_string($title_len);
    $post['content'] = random_string($content_len);
    $post['content_html'] = random_string($content_html_len);
    $post['author'] = random_string($author_len);
    $data[] = $post;
  }
  return $data;
}

// generates nested arrays
function generate_synthetic_data($depth, $max) {
  static $seed;
  if (is_null($seed)) {
    $seed = array('a', 2, 'c', 4, 'e', 6, 'g', 8, 'i', 10);
  }
  if ($depth < $max) {
    $node = array();
    foreach ($seed as $key) {
      $node[$key] = generate_synthetic_data($depth + 1, $max);
    }
    return $node;
  }
  return 'empty';
}

// runs tests with given data
function test_performance($data, $iterations) {
  $json_encode = array();
  $json_decode = array();
  $serialize = array();
  $unserialize = array();
  $var_export = array();
  $eval = array();
  $results = array();

  for ($i=0; $i < $iterations; $i++) { 
    // json_encode
    $json_encoded_data = array();
    $start = microtime( true );
    foreach ($data as $key => $value) {
      $json_encoded_data[] = json_encode($value);
    }
    $time_json = microtime(true) - $start;
    $json_encode[] = $time_json;

    // serialize
    $serialized_data = array();
    $start = microtime(true);
    foreach ($data as $key => $value) {
      $serialized_data[] = serialize($value);
    }
    $time_serialize = microtime(true) - $start;
    $serialize[] = $time_serialize;

    // var_export
    $exported_data = array();
    ob_start(); // otherwise var_export would echo all over the place
    $start = microtime(true);
    foreach ($data as $key => $value) {
      $exported_data[] = var_export($value, true);
    }
    $time_export= microtime(true) - $start;
    ob_end_clean(); // otherwise var_export would echo all over the place
    $var_export[] = $time_export;

    // json_decode
    $start = microtime( true );
    foreach ($json_encoded_data as $key => $value) {
      json_decode($value);
    }
    $time_json_decode = microtime(true) - $start;
    $json_decode[] = $time_json_decode;

    // unserialize
    $start = microtime(true);
    foreach ($serialized_data as $key => $value) {
      unserialize($value);
    }
    $time_unserialize = microtime(true) - $start;
    $unserialize[] = $time_unserialize;

    // eval
    $start = microtime(true);
    foreach ($exported_data as $key => $value) {
      eval('return '.$value.';');
    }
    $time_eval = microtime(true) - $start;
    $eval[] = $time_eval;
  }

  $results['json_encode'] = $json_encode;
  $results['serialize'] = $serialize;
  $results['var_export'] = $var_export;
  $results['json_decode'] = $json_decode;
  $results['unserialize'] = $unserialize;
  $results['eval'] = $eval;

  return $results;
}

Spage.fi runs on Amazon micro instance, and the results on that look like this:

synthetic:

json_encode
avg:    0.032392120361328

serialize
avg:    0.076668643951416

var_export
avg:    0.070131397247314

json_decode
avg:    0.11707892417908

unserialize
avg:    0.083262372016907

eval
avg:    0.16133031845093


realistic:

json_encode
avg:    0.034033703804016

serialize
avg:    0.0052637815475464

var_export
avg:    0.016560435295105

json_decode
avg:    0.040717649459839

unserialize
avg:    0.0040281295776367

eval
avg:    0.017697238922119

"Synthetic" results are done with deep nested arrays where all values are short. "Realistic" results are done with arrays that mirror typical blog post, so no nested arrays, few keys and all values are 1 to 2500 characters.

So based on this test, with shallow arrays serialize and unserialize are about ten times faster than json_encode and json_decode and twice as fast as var_export and eval. With deep arrays json_encode is about twice as fast as serialize or var_export and unserialization functions perform about the same.