Benchmark Pagination Strategies in MongoDB

MongoDB is a document based data store and hence pagination is one of the most common use case of it. So when do you paginate the response? The answer is pretty neat; you paginate whenever you want to process result in chunks. Some common scenarios are

  • Batch processing
  • Showing huge set of results on user interface

There are multiple approaches through which you can paginate your result set in MongoDB. This blog post is dedicated for results of benchmark of two approaches and its analysis, so here we go ...

Benchmark has been done over a non-indexed collection. Each document of the collection looks something like this

        "_id" : ObjectId("5936d17263623919cd5165bd"),
        "name" : "Lisa Rogers",
        "marks" : 34

All records of a collection are fetched page-wise. Size of each page is fixed during fetch of the collection. Each page is fetched 3 times and average of, time to fetch one “page”, 3 is recorded.

Following image shows the how two approach fares against each other.

MongoDB Pagination Benchmark Results

A key observation to note is that, till 500-600 count, both the approaches are comparable, but once it crosses that threshold, there is sudden rise in response time for skip and limit approach than other. The approach using _id and limit almost gives constant performance and is independent of size of the result set.

I tried running this test on different machines with different disks but results were similar. I think diving deep in MongoDB's database drivier will yield better information about this behavior. You could see some spikes in the response times, that are because of Disk Contention.

In short: - For huge result set, paginating using _id and limit is far better than using skip and limit. - For smaller result set, it does not matter, but prefer skip and limit.

An interesting thing I observed is that after page size crosses 100, the gap between the two approach reduces to some extent. I am yet to perform detailed benchmark on that as such use-case (where page-size is more than 100) is pretty rare in practical applications.

You can find the Python code used for this benchmark here. If you have any suggestion or improvement, do let me know.

Arpit Bhayani

Arpit's Newsletter

CS newsletter for the curious engineers

❤️ by 38000+ readers

If you like what you read subscribe you can always subscribe to my newsletter and get the post delivered straight to your inbox. I write essays on various engineering topics and share it through my weekly newsletter.

Other essays that you might like

Designing Taxonomy on a Relational DB

1303 reads 2021-04-18

In this essay, design taxonomy on a SQL-based Relational database by taking Udemy as an example, write SQL queries cover...

Sliding Window based Rate Limiter

1056 reads 2020-04-05

A rate limiter is used to control the rate of traffic sent or received on the network and in this article we dive deep a...

Eight Rituals to be a Better Programmer

2709 reads 2020-02-28

"How to get better at programming?" is the question I had been asked quite a few times, and today I lay down the 8 ritua...

Multiple MySQL server running on same Ubuntu server

285 reads 2016-05-13

Have multiple MySQL versions running on same server within 5 minutes....

Be a better engineer

A set of courses designed to make you a better engineer and excel at your career; no-fluff, pure engineering.

Paid Courses

System Design for Beginners

A masterclass that helps early engineers and product managers become great at designing scalable systems.

180+ learners

Details →

System Design Masterclass

A masterclass that helps you become great at designing scalable, fault-tolerant, and highly available systems.

1000+ learners

Details →

Redis Internals

Learn internals of Redis by re-implementing some of the core features in Golang.

98+ learners

Details →

Free Courses

Designing Microservices

A free playlist to help you understand Microservices and their high-level patterns in depth.

823+ learners

Details →

GitHub Outage Dissections

A free playlist to help you learn core engineering from outages that happened at GitHub.

651+ learners

Details →

Hash Table Internals

A free playlist to help you understand the internal workings and construction of Hash Tables.

1027+ learners

Details →

BitTorrent Internals

A free playlist to help you understand the algorithms and strategies that power P2P networks and BitTorrent.

692+ learners

Details →