I was writing a service that allows a pretty loose structured data model to be stored in MongoDB via a REST API and realized a small problem: How can I validate the size of the data being input while still allowing a loose structure?
See, I didn't want to force a particular structure on the developer, but PHP doesn't have a way of getting the "size" of a variable. Since PHP is loosely typed, its very difficult to determine how much memory something uses or how much space something would take to store.
After doing a bit of researching and thinking, I realized that the best bet for a relative size comparison would be to serialize the data into a string and then to use
strlen() to figure out the size, in bytes, of the data to validate. This works especially well, since PHP's
strlen() implementation uses the C native function, which doesn't actually count "characters", but actually counts the number of "bytes" in a string. Strange.., but Perfect!
Now, for the next dilemma... what's the most efficient and consistent serialization format for testing the size of the data? After a bit, I figured "hell, this data is being stored in MongoDB, why not use their native serialization format?". But, was using bson_encode() going to be efficient?
So, naturally, I benchmarked it. What I found might be surprising to some:
BSON encoding and decoding turned out to be significantly faster than JSON encoding and decoding, and PHP's native serialization format was somewhere in between.
So, although it may not be a perfect way to check the size of a particular piece of data in PHP, it does a great job of determining a relative size and at least making sure that a user isn't abusing the system by attempting to enter tons of data.
If you'd like to check out the benchmark, view the code, modify it, or have some idea, check it out here. Its a gist, so feel free to fork it! :)