Benchmarking BSON, JSON, and Native Serializing in PHP
I was writing a service that allows a pretty loose structured data model to be stored in MongoDB via a REST API and realized a small problem: How can I validate the size of the data being input while still allowing a loose structure?
See, I didn't want to force a particular structure on the developer, but PHP doesn't have a way of getting the "size" of a variable. Since PHP is loosely typed, its very difficult to determine how much memory something uses or how much space something would take to store.
After doing a bit of researching and thinking, I realized that the best bet for a relative size comparison would be to serialize the data into a string and then to use strlen()
to figure out the size, in bytes, of the data to validate. This works especially well, since PHP's strlen()
implementation uses the C native function, which doesn't actually count "characters", but actually counts the number of "bytes" in a string. Strange.., but Perfect!
Now, for the next dilemma... what's the most efficient and consistent serialization format for testing the size of the data? After a bit, I figured "hell, this data is being stored in MongoDB, why not use their native serialization format?". But, was using bson_encode() going to be efficient?
So, naturally, I benchmarked it. What I found might be surprising to some:
BSON encoding and decoding turned out to be significantly faster than JSON encoding and decoding, and PHP's native serialization format was somewhere in between.
So, although it may not be a perfect way to check the size of a particular piece of data in PHP, it does a great job of determining a relative size and at least making sure that a user isn't abusing the system by attempting to enter tons of data.
If you'd like to check out the benchmark, view the code, modify it, or have some idea, check it out here. Its a gist, so feel free to fork it! :)
Written by Trevor N. Suarez
Related protips
4 Responses
Could you also add MessagePack (http://msgpack.org/), and igbinary to the mix?
@bungle Its certainly possible. If you'd like to see how those perform, you're welcome to modify the code, as the source is here:
https://gist.github.com/Rican7/6457237
@bungle @rican7 I just forked and modified the source to add messagepack bench-marking. I haven't been able to test my addition to the code yet as I only have a windows box at work.
Running benchmark for...
native
10000 times
Test completed!!
Encoding time: 0.12195706367493
Decoding time: 0.11269879341125
Total time: 0.23465585708618
Encoded size: 1122 bytes
Running benchmark for...
json
10000 times
Test completed!!
Encoding time: 0.12404799461365
Decoding time: 0.47142910957336
Total time: 0.59547710418701
Encoded size: 808 bytes
Running benchmark for...
bson
10000 times
Test completed!!
Encoding time: 0.051403045654297
Decoding time: 0.059870004653931
Total time: 0.11127305030823
Encoded size: 900 bytes
Running benchmark for...
igbinary
10000 times
Test completed!!
Encoding time: 0.15784883499146
Decoding time: 0.067402124404907
Total time: 0.22525095939636
Encoded size: 517 bytes
Running benchmark for...
msgpack
10000 times
Test completed!!
Encoding time: 0.052465200424194
Decoding time: 0.093418836593628
Total time: 0.14588403701782
Encoded size: 667 bytes