Last Updated: May 15, 2019
·
10.33K
· rican7

Benchmarking BSON, JSON, and Native Serializing in PHP

I was writing a service that allows a pretty loose structured data model to be stored in MongoDB via a REST API and realized a small problem: How can I validate the size of the data being input while still allowing a loose structure?

See, I didn't want to force a particular structure on the developer, but PHP doesn't have a way of getting the "size" of a variable. Since PHP is loosely typed, its very difficult to determine how much memory something uses or how much space something would take to store.

After doing a bit of researching and thinking, I realized that the best bet for a relative size comparison would be to serialize the data into a string and then to use strlen() to figure out the size, in bytes, of the data to validate. This works especially well, since PHP's strlen() implementation uses the C native function, which doesn't actually count "characters", but actually counts the number of "bytes" in a string. Strange.., but Perfect!

Now, for the next dilemma... what's the most efficient and consistent serialization format for testing the size of the data? After a bit, I figured "hell, this data is being stored in MongoDB, why not use their native serialization format?". But, was using bson_encode() going to be efficient?

So, naturally, I benchmarked it. What I found might be surprising to some:

Picture

BSON encoding and decoding turned out to be significantly faster than JSON encoding and decoding, and PHP's native serialization format was somewhere in between.

So, although it may not be a perfect way to check the size of a particular piece of data in PHP, it does a great job of determining a relative size and at least making sure that a user isn't abusing the system by attempting to enter tons of data.

If you'd like to check out the benchmark, view the code, modify it, or have some idea, check it out here. Its a gist, so feel free to fork it! :)

4 Responses
Add your response

Could you also add MessagePack (http://msgpack.org/), and igbinary to the mix?

over 1 year ago ·

@bungle Its certainly possible. If you'd like to see how those perform, you're welcome to modify the code, as the source is here:
https://gist.github.com/Rican7/6457237

over 1 year ago ·

@bungle @rican7 I just forked and modified the source to add messagepack bench-marking. I haven't been able to test my addition to the code yet as I only have a windows box at work.

over 1 year ago ·

Running benchmark for...

native

10000 times

Test completed!!

Encoding time: 0.12195706367493

Decoding time: 0.11269879341125

Total time: 0.23465585708618

Encoded size: 1122 bytes

Running benchmark for...

json

10000 times

Test completed!!

Encoding time: 0.12404799461365

Decoding time: 0.47142910957336

Total time: 0.59547710418701

Encoded size: 808 bytes

Running benchmark for...

bson

10000 times

Test completed!!

Encoding time: 0.051403045654297

Decoding time: 0.059870004653931

Total time: 0.11127305030823

Encoded size: 900 bytes

Running benchmark for...

igbinary

10000 times

Test completed!!

Encoding time: 0.15784883499146

Decoding time: 0.067402124404907

Total time: 0.22525095939636

Encoded size: 517 bytes

Running benchmark for...

msgpack

10000 times

Test completed!!

Encoding time: 0.052465200424194

Decoding time: 0.093418836593628

Total time: 0.14588403701782

Encoded size: 667 bytes

over 1 year ago ·