Where developers come to connect, share, build and be inspired.

2

Benchmarking BSON, JSON, and Native Serializing in PHP

6341 views


I was writing a service that allows a pretty loose structured data model to be stored in MongoDB via a REST API and realized a small problem: How can I validate the size of the data being input while still allowing a loose structure?

See, I didn't want to force a particular structure on the developer, but PHP doesn't have a way of getting the "size" of a variable. Since PHP is loosely typed, its very difficult to determine how much memory something uses or how much space something would take to store.

After doing a bit of researching and thinking, I realized that the best bet for a relative size comparison would be to serialize the data into a string and then to use strlen() to figure out the size, in bytes, of the data to validate. This works especially well, since PHP's strlen() implementation uses the C native function, which doesn't actually count "characters", but actually counts the number of "bytes" in a string. Strange.., but Perfect!

Now, for the next dilemma... what's the most efficient and consistent serialization format for testing the size of the data? After a bit, I figured "hell, this data is being stored in MongoDB, why not use their native serialization format?". But, was using bson_encode() going to be efficient?

So, naturally, I benchmarked it. What I found might be surprising to some:

Picture

BSON encoding and decoding turned out to be significantly faster than JSON encoding and decoding, and PHP's native serialization format was somewhere in between.

So, although it may not be a perfect way to check the size of a particular piece of data in PHP, it does a great job of determining a relative size and at least making sure that a user isn't abusing the system by attempting to enter tons of data.

If you'd like to check out the benchmark, view the code, modify it, or have some idea, check it out here. Its a gist, so feel free to fork it! :)

Comments

  • E86fe41e9dc23940297715ec32cdfdf1

    Could you also add MessagePack (http://msgpack.org/), and igbinary to the mix?

  • 08-16-2009_-_niantic_boardwalk__3___square_

    @bungle Its certainly possible. If you'd like to see how those perform, you're welcome to modify the code, as the source is here: https://gist.github.com/Rican7/6457237

  • B70a47ae98f756f126346f554a3d2e95

    @bungle @rican7 I just forked and modified the source to add messagepack bench-marking. I haven't been able to test my addition to the code yet as I only have a windows box at work.

  • 3cbd85f025644777d045652cabab84ca

    Running benchmark for...

    native

    10000 times

    Test completed!!

    Encoding time: 0.12195706367493

    Decoding time: 0.11269879341125

    Total time: 0.23465585708618

    Encoded size: 1122 bytes

    Running benchmark for...

    json

    10000 times

    Test completed!!

    Encoding time: 0.12404799461365

    Decoding time: 0.47142910957336

    Total time: 0.59547710418701

    Encoded size: 808 bytes

    Running benchmark for...

    bson

    10000 times

    Test completed!!

    Encoding time: 0.051403045654297

    Decoding time: 0.059870004653931

    Total time: 0.11127305030823

    Encoded size: 900 bytes

    Running benchmark for...

    igbinary

    10000 times

    Test completed!!

    Encoding time: 0.15784883499146

    Decoding time: 0.067402124404907

    Total time: 0.22525095939636

    Encoded size: 517 bytes

    Running benchmark for...

    msgpack

    10000 times

    Test completed!!

    Encoding time: 0.052465200424194

    Decoding time: 0.093418836593628

    Total time: 0.14588403701782

    Encoded size: 667 bytes

Add a comment