Where developers come to connect, share, build and be inspired.

113

Pagination: You're (Probably) Doing It Wrong.

18746 views

Pagination? No problem! SELECT ... LIMIT X, Y Right?

Hah! Not quite!

You see, your pagination logic is not stable; and that matters quite a bit in this ajaxy, client-appy, infinite-scrolly world.

TL;DR?

  • Don't paginate on row offset.
  • Find a stable value to paginate on, like an item's date.
  • Always deduplicate list items on the client side.
  • Never trust your server to know exactly what the user is looking at.

What are you going on about?

It's probably best to give an example. Let's say you're adding comments to a page, and you want the most recent comments to show up first. The naive approach is to construct a query similar to:

SELECT * FROM comments ORDER BY date DESC LIMIT 0, 10

Easy. That gives you the first 10 most recent comments. Nothing wrong here. Now, the user pages down and you perform the same query, except for LIMIT 10, 10 - and you return the next 10 comments...

...sometimes.

Consider this:

  • User A is viewing a page with the first 10 comments loaded.
  • User B creates a comment.
  • User A loads the next page of comments.

Well, crap. Now User A sees a duplicate comment!

The last comment on their first page got pushed down by the new comment, and it became the 11th comment in the list. Now your page just looks sloppy ☹

A similar situation occurs if you allow for removal of items from a list, but with even more disastrous results.

Instead of the comments being pushed down, they get pulled up. The item that would have previously been first on the next page is shifted up; and User A would completely skip over it! That really sucks ☹ ☹

Stable pagination

Don't fret, this is usually pretty easy to solve. Instead of paginating on a volatile offset (the row number), find a value that is stable across time. In the comments example, that is the date of the comments themselves.

Take the example above: Your first query is the same. However, for the next page, instead of passing the row offset you pass the date of the last comment in the list:

SELECT * FROM comments WHERE date < prevous_date LIMIT 10

Perfect! Now you can be sure that the user will always see the very next set of commits that occurred after the one they are looking at.

Sometimes...

Caveat entroptor

What if you have two comments with the same date? Ack! Pagination is hard.

You've got a few options:

If you don't have an auto incrementing id (e.g. most distributed data stores), then things are a bit more complicated:

  • Instead of finding comments that occurred before the date, find comments that occurred at or before that date. Then, make sure you filter out duplicate comments before displaying them to the user.

or:

  • Pass additional state that you can use to ignore the comments the user has already seen (maybe the ids of any returned comments with that date).

The first option is probably the most straightforward, even though it requires that a fair amount of duplicate data flies over the wire.

That's great, but what about this?

Yeah. There's also there's this frequently seen control:

[<<] [<] [1] 2 [3] ... [>] [>>]

How do you jump to a page if you're paginating with a stable offset like the comment date? You don't.

Before I go any further, answer this: How frequently do your users actually jump to a specific page other than next/prev/first/last?

I'd challenge you to avoid a design where the user needs to jump to arbitrary pages. There's plenty of other methods that allow a user to more effectively seek to the content they want (filtering being the most common approach).

However, this raises some other interesting questions, doesn't it?

Skip to the end!

With this world of pagination, how do I let the user skip to the end? Ack! Pagination is hard.

Well, if you have to support skipping to the end of a list, one possibilty: Flip your sort order around and get the first 10 items from the other direction. Not so bad!

Similarly, how do you let the user go back a page? Ack! Pagination is really hard.

Thankfully, you can adapt the approach for getting the last page to this case. After all, it's just a regular list that you happen to be sorting a different way: Take the date of the item at the top of the list that the user is looking at, and get the page of items before it! Neat!

Make sure this is right for you, though. Some situations arise where you can't easily reverse the order, and then you need to figure out more novel approaches (binary search! chunking! who knows?!?).

Unstable pagination

Stable pagination is not a silver bullet. Consider sites like Reddit and Hacker News. Their pagination is inherently unstable: items are sorted by a volatile ranking, frequently moving up and down the list over time. How do they deal with it? They don't.

Both sites frequently showcases how bad it can look by showing duplicate items as you advance pages. It's not all that bad, because neither site subscribes to the infinite scrolling crowd (yet?).

However, there are plenty of client apps out there that present both Reddit and Hacker News as an infinite scrolling feed; and some of them just show duplicate items stacked on top of each other as you scroll down. Amateur hour! All they need to do to fix that half of the problem is to ignore duplicate items when appending the next page.

There's still the possibility of completely hiding items from you as they fall down in ranking. That's more difficult to solve; the most straightforward route would be to return the full list down to the user's position each time, but that's crazy! It's probably best to just ignore that case.

Comments

  • Blank-mugshot
    thrashr888

    Another way would be to use a stream/timeline like the Twitter API uses. It works similari to the date method but doesn't have 2 items with the same date. With this you don't request page=2, you request the first set with a certain count and it gives you the sinceid and maxid. To get the next page you request a certain count starting at the cursor (max_id). It doesn't let you sort or jump to the end so it is still a pain to get your whole tweet history. Read more: https://dev.twitter.com/docs/working-with-timelines

  • 0690eb8746a91fac2464a0a482f36fed_normal
    toddmorey

    Interesting discussion. I hadn't thought about how unstable certain lists can be and the impact that has on pagination. One quirk of hacker news is how the previous page link can "expire." Seems they could handle that more gracefully.

  • Blank-mugshot
    clord

    I just paginate from the start of time, not the end. A book starts with the 'oldest' page first, and goes to the 'newest', or last, page. So what if your blog starts on page 63? Why does the newest stuff need to be page 1?

    Start from page 1, and build pages from there. Have a 'last 10 posts' default view. Nice and stable. Works better for caching too. (once you build a page, you can cache it forever)

  • 0_jhxb8kogqv4fnhqzutmh8fogeud29aqzucwh8fd_vjjpdel4v67uabgskns1vdbn4igw78bbijel
    stasmo

    I think if you're at the point where you have hundreds of users, you should consider implementing a filter.

  • Transparent-large
    nevir

    @clord that's only stable as long as you don't allow removals from the list; but yeah, simplicity is often the best design!

  • Blank-mugshot
    clord

    @nevir If I'm going to bust a bunch of caches, I want to do it as infrequently as possible. What's more common, adding posts, or removing them? Optimize for that case.

  • Photo-fb
    haliphax

    Paginate by ID, rather than some arbitrary sum. page=2&id=107 would mean "I'm on page 2" for aesthetics, but the server-side app would know that page 2 begins at the item with ID# 107 (or the next greater ID# after 107 if it has been deleted).

  • 0_ljrrcpgupxlagbeaa0i7ca_rpkloc-da1vx7crhbit5agnoljrhpmknw-oa33zhjko-a4btde9oo
    vinceve

    If you're writing software where the user needs to manage a lot of data (not just scrolling through a list of data) this approach isn't handy at all for the user. Example: you have a user management system and certain users need to be assigned a specific value. As an admin I want to find my users asap, with oldskool pagination I would find my users almost instantly. If I need to scroll through hundred of users it would maybe take half an hour. :-)

    But nice article I think some elements are definitely true!

  • Strand-luca-001_normal
    robiii

    For comments / tweets / etc. I'll prefer the most pragmatic solution every time over a more convoluted solution; if that means showing a comment (maybe) twice: so be it. The sinceid-method mentioned by thrashr888 works good for this kind of data also. For many other types of paginated data (customers / orders / etc. ) I even suggest not using pagination at all. I blogged about that not too long ago: http://robthree.blogspot.com/2012/12/about-pagination-of-data.html

Add a comment