Last Updated: February 25, 2016
· richthegeek

The Pareto Principle

My work lately has felt like a solid month of "almost there". We have constantly been in a state where the system almost works save for a few annoying bugs and one known major issue.

Solving one always led to the next issue cropping up within a few hours, and each issue felt more fatal than the last. The last three days in particular have been particularly foul.

It started with running up against the size limit for beanstalk objects, at a measly 64kb. We were trying, for perfectly valid reasons, to store about 30mb of data in an object (as a task in a queue), and of course it was choking. So we solved this by adding Redis to the stack of technologies we are using. So now Beanstalk holds the task with a pointer to a Redis object
which holds the big part of the data. This solved the beanstalk issue.

And then we found that the system was trying to process everything multiple times because of data timeouts (loading 25k records from Mongo takes a bit of time, funnily). The broad reasoning is that we weren't locking the schedules as we came across them so they were being processed multiple times. Again, Redis came to the rescue - due to the structure of the task (a tree of processes (nodes) with dependencies (edges)) we could simply track where each process was (pending, running, failed, or completed). So that solved that issue.

Having switched that over, we thought we were home and dry. Nope. The first issue punched us in the face again with Mongo. That too has an object size limit, of 16mb (for non-GridFS collections). Happily, solving that was simply a case of splitting up the data into individual records (the data was congruent to this format).

I'm not entirely sure how much of that makes sense - context is a funny thing - but the gist of it is that the constant "almost there" feeling has been particularly powerful lately. Which brings us to the title of this post...

  • 80% of X is consumed by 20% of Y *
  • 80% of effort is expended on 20% of a project *
  • 80% of SLOC in 20% of the schedule *

This is a fairly universal law, cropping up in just about everything to some degree. It is especially common, it seems, in software development. Most bits of software can be described as "a bit of secret sauce with a bunch of stitching to make it work". And so 20% of the code takes 80% of the effort. Or 80% of the value is derived from 20% of the code. Or, to take it further, 80% of a company's profit is produced by 20% of it's employees.

So how do we reconcile scheduling, planning, and hapiness with the inherent "almost there" that comes with the Pareto principle?

Another time, for that one - in keeping with the theme of this post I'll leave it 80% too long with 20% not said.