Pagination? No problem!
SELECT ... LIMIT X, Y Right?
Hah! Not quite!
You see, your pagination logic is not stable; and that matters quite a bit in this ajaxy, client-appy, infinite-scrolly world.
- Don't paginate on row offset.
- Find a stable value to paginate on, like an item's date.
- Always deduplicate list items on the client side.
- Never trust your server to know exactly what the user is looking at.
What are you going on about?
It's probably best to give an example. Let's say you're adding comments to a page, and you want the most recent comments to show up first. The naive approach is to construct a query similar to:
SELECT * FROM comments ORDER BY date DESC LIMIT 0, 10
Easy. That gives you the first 10 most recent comments. Nothing wrong here. Now, the user pages down and you perform the same query, except for
LIMIT 10, 10 - and you return the next 10 comments...
- User A is viewing a page with the first 10 comments loaded.
- User B creates a comment.
- User A loads the next page of comments.
Well, crap. Now User A sees a duplicate comment!
The last comment on their first page got pushed down by the new comment, and it became the 11th comment in the list. Now your page just looks sloppy ☹
A similar situation occurs if you allow for removal of items from a list, but with even more disastrous results.
Instead of the comments being pushed down, they get pulled up. The item that would have previously been first on the next page is shifted up; and User A would completely skip over it! That really sucks ☹ ☹
Don't fret, this is usually pretty easy to solve. Instead of paginating on a volatile offset (the row number), find a value that is stable across time. In the comments example, that is the date of the comments themselves.
Take the example above: Your first query is the same. However, for the next page, instead of passing the row offset you pass the date of the last comment in the list:
SELECT * FROM comments WHERE date < prevous_date LIMIT 10
Perfect! Now you can be sure that the user will always see the very next set of commits that occurred after the one they are looking at.
What if you have two comments with the same date? Ack! Pagination is hard.
You've got a few options:
- If you are using an auto incrementing identifier for your content, use that instead of the date; it's guaranteed to increase over time and behave nicely for these cases. At that point, you're going down the path of implementing cursors.
If you don't have an auto incrementing id (e.g. most distributed data stores), then things are a bit more complicated:
- Instead of finding comments that occurred before the date, find comments that occurred at or before that date. Then, make sure you filter out duplicate comments before displaying them to the user.
- Pass additional state that you can use to ignore the comments the user has already seen (maybe the ids of any returned comments with that date).
The first option is probably the most straightforward, even though it requires that a fair amount of duplicate data flies over the wire.
That's great, but what about this?
Yeah. There's also there's this frequently seen control:
[<<] [<]  2  ... [>] [>>]
How do you jump to a page if you're paginating with a stable offset like the comment date? You don't.
Before I go any further, answer this: How frequently do your users actually jump to a specific page other than next/prev/first/last?
I'd challenge you to avoid a design where the user needs to jump to arbitrary pages. There's plenty of other methods that allow a user to more effectively seek to the content they want (filtering being the most common approach).
However, this raises some other interesting questions, doesn't it?
Skip to the end!
With this world of pagination, how do I let the user skip to the end? Ack! Pagination is hard.
Well, if you have to support skipping to the end of a list, one possibilty: Flip your sort order around and get the first 10 items from the other direction. Not so bad!
Similarly, how do you let the user go back a page? Ack! Pagination is really hard.
Thankfully, you can adapt the approach for getting the last page to this case. After all, it's just a regular list that you happen to be sorting a different way: Take the date of the item at the top of the list that the user is looking at, and get the page of items before it! Neat!
Make sure this is right for you, though. Some situations arise where you can't easily reverse the order, and then you need to figure out more novel approaches (binary search! chunking! who knows?!?).
Stable pagination is not a silver bullet. Consider sites like Reddit and Hacker News. Their pagination is inherently unstable: items are sorted by a volatile ranking, frequently moving up and down the list over time. How do they deal with it? They don't.
Both sites frequently showcases how bad it can look by showing duplicate items as you advance pages. It's not all that bad, because neither site subscribes to the infinite scrolling crowd (yet?).
However, there are plenty of client apps out there that present both Reddit and Hacker News as an infinite scrolling feed; and some of them just show duplicate items stacked on top of each other as you scroll down. Amateur hour! All they need to do to fix that half of the problem is to ignore duplicate items when appending the next page.
There's still the possibility of completely hiding items from you as they fall down in ranking. That's more difficult to solve; the most straightforward route would be to return the full list down to the user's position each time, but that's crazy! It's probably best to just ignore that case.