Joined September 2013
·

Greg Jorgensen

Thailand
·
·

Ugh. Bloat your primary key from an INT to a long string, making indexes much bigger, comparisons take longer. This might make sense if you are sharding and don't have a real primary key, but I can't think of any other reason to do it.

Posted to SQL Queries Kill Website Scalability over 1 year ago

jfay_dba... I know the author refers to "patterns" but the article is a mixture of very specific advice complete with code examples and unsubstantiated claims about the evils of relational JOINs. I don't see any actual patterns here, just straw man arguments.

It's obvious that architecting an application to not hit the database any more than necessary will probably speed things up: "We can probably all agree that the fastest websites you can write wouldn’t use databases at all." Yes, I agree, but that is neither a design pattern nor is it scalable.

Switching gears from PHP session management to using UUIDs as primary keys to relational joins the author writes "A JOIN’s is like telling a waiter to get multiple orders at once. It's part of their job, but if you try to give that waiter 1,000 orders all at the same time, the waiter does a pretty bad job." Actually a JOIN is more like having the waiter bring the entire order out at once to serve a table rather than making separate trips to the kitchen for each dish.

If you decide not to store related data in normalized tables and use the highly-optimized RDBMS to do the work you have two alternatives: Do it yourself by writing the join lookup code in the application (an anti-pattern I see quite a bit, perpetuated by simplistic ORMs), or by abandoning normalization and stuffing everything in a single table. Neither of those scale nearly as well as the relational solution.

The author has not made a clear case against the relational model or RDBMSs, just some vague statements about JOINs killing websites. Nor has the author offered a workable alternative. There are no design patterns at all that I can discern. The advice about UUIDs and not using JOINs is terrible and flies in the face of years of relational database experience, including some real time-tested patterns.

Sure, there are applications where the relational model is not the best fit. Those tend to be at the very low end (toy to-do list apps and recipe managers) and at the very high end (web search engine data). I don't think a database of favorite movies is going to tax a relational database.

There are always cases where denormalization is a solution. The author is advocating unnormalized data -- by eschewing JOINs from the outset the data will never be normalized (and therefore can't be denormalized). For the rare databases that scale to the point where sharding and denormalization are actually needed, great. But in my experience those cases are rare and occur in organizations that have resources to use high-end RDBMSs such as Oracle. The large majority of databases can be handled just fine with normal proven relational techniques. If those are failing you first look at your schema design, your indexes, and your queries.

When I used to do a lot of C and C++ programming I would sometimes work with programmers who blamed every bug in their code on the compiler or the library. I've run into compiler bugs once or twice in my career. I've run into bugs introduced by other programmers tens of thousands of times. The same goes for databases: if your database is slow or crashing or not working for your application you should first look at your own code and database design and not assume relational theory and SQL sucks because you're using it wrong.

And thanks for spreading bad database/SQL ideas around. I make a good living cleaning up the mess left by programmers who do stuff like kocyigityunus advocates in the comments. If you think half-baked solutions like that scale better (including scaling over time) than proven relational techniques please refer your clients to me when you can't keep them happy.

You are 100% wrong about JOINs. You are advocating a solution that is much worse than the problem you are trying to fix. If your queries are slow because of JOINs your database is poorly designed or your queries are badly written, it's a simple as that. You are casually dismissing 30+ years of RDBMS experience.

The integrity of the data in your database is much more important than your concerns about speed or scalability. RDBMS optimization and scalability are problems that have already been solved. You inventing an ACID-compliant system that's better is unlikely.

I suggest Chris Date's book "Database In Depth: Relational Theory For Practitioners." It's written for programmers.

Posted to SQL Queries Kill Website Scalability over 1 year ago

PHP can already store session information in the file system, memcache, or redis. There's no reason to reinvent the wheel.

Adding a UUID to every table as a key is a terrible idea. You bloat your primary key and every index from 4 bytes for the usual INT UNSIGNED to 32 bytes for the UUID, and replace a fast integer comparison with a slow string comparison, for every key match. That might explain the JOIN problems you're having.

If you need to expose a database row ID in your URLs but want to make it tamper-proof it's better to use a salted SHA1 or even MD5 hash along with the ID. This is a fairly common practice to protect HTML FORM fields from tampering. Even better is using PHP's session storage so you aren't exposing raw database IDs in the browser.

Throwing out JOIN means not using a relational model (it's called unnormalized data). It may be appealing at first but it will cause duplicate and inconsistent data, and you'll be writing application code to make your app ACID-compliant. You are unlikely to get that right -- RDBMSs have a 30 year head start on you.

Writing that RDBMSs are not scalabale or that SQL queries "were the leading cause of website fatalities" in 2012 (where did you get that from?) just advertises inexperience. Every big enterprise application and almost every big web site runs on top of RDBMSs, leveraging several decades of reliability, redundancy, optimization, and scalability experience. Do you think your bank uses Mongo or some home-rolled UUID-based system without JOINs? No, they use Oracle.

In my considerable experience working with broken web sites and databases (that's what I do for a living) I'd say the leading cause of website fatalities is bad programming. If your database server is slow or crashing you should be looking at your database design and queries, not coming to the conclusion that RDBMSs suck. If I crash on my motorcycle it's alway my own fault, and I don't conclude that I know more about building motorcycles than Kawasaki does.

Achievements
1 Karma
0 Total ProTip Views