Why caching would not speed up Mark-and-Sweep GC?

801 views Garbage Collection

Mark and Sweep GC is one of the most common garbage collection algorithms that has seen a massive adoption. The algorithm has two phases:

  • Mark Phase: we iterate through the object reference graph and mark all the reachable objects as live; and
  • Sweep Phase: we iterate through all the objects in the main memory and delete all the objects that are not marked live; thus cleaning up the garbage

We need Garbage Collectors to be fast

Garbage collectors need to be fast because we want the CPU cycles to be used in running the core business logic of the user program and not cleaning up unused variables.

What is a Cache?

A cache is anything that stores the data so that future requests are served faster. The data we can cache can be

  • results from the previous computation
  • a copy of the data from a slower storage

When does caching improve performance?

Caching improves the performance of an application only when the use case exhibits one of the following two behaviors

Temporal Locality

A recently accessed memory location is likely to be accessed again.

By caching the data, we can thus serve the information from the cache instead of doing an expensive lookup or computation.

Spatial Locality

If a location is recently accessed, the location adjacent to that is likely to be accessed soon.

By caching the data, we can thus serve the information from the cache instead of going through slower storage like disk or main memory.

Hardware and Cache Prefetching

To leverage caching, modern hardware pre-fetches the data, likely to be accessed, from the slower storage and caches it. This boosts the performance of the application.

There are two ways to pre-fetch:

  • hardware intelligently pre-fetches the data as per the access pattern
  • hardware exposing “prefetch” instruction leaving to the business logic to prefetch

Why caching would not improve GC performance?

As we established, for caching to improve the performance our use-case would need to exhibit either spatial or temporal locality. Do mark and sweep exhibit either of them?

Mark n Sweep and Temporal Locality

Mark and Sweep GC does not exhibit temporal locality given that we mark all the reachable objects in one iteration and in other iteration we go through the unreachables one and free them.

Mark n Sweep and Spatial Locality

Mark and Sweep GC does not exhibit spatial locality given we we do a DFS on object reference tree which hopes from one object too another in random order. So pre-fetching of memory locations would not help.

Arpit Bhayani

Arpit's Newsletter

CS newsletter for the curious engineers

❤️ by 17000+ readers

If you like what you read subscribe you can always subscribe to my newsletter and get the post delivered straight to your inbox. I write essays on various engineering topics and share it through my weekly newsletter.

Other essays that you might like

Why caching would not speed up Mark-and-Sweep GC?

801 views 47 likes 2022-05-20

So, caching doesn't always work! What would happen if we apply caching to speed up our Mark and Sweep Gargabge collecto...

Tricolor Abstraction to build concurrent Garbage Collectors

840 views 43 likes 2022-05-06

A basic Mark-and-Sweep garbage collection algorithm operates in Stop-the-World mode, which means the program execution p...

Mark and Sweep Garbage Collection Algorithm

1175 views 60 likes 2022-04-29

Garbage Collection has to be one of the most interesting topics of discussion out there. In the previous videos, we took...

How to pick a garbage collector?

1046 views 50 likes 2022-04-22

"Best" Garbage Collector is a myth. If you are building your Garbage Collector or trying to pick the best for your use ...

Be a better engineer

A set of courses designed to make you a better engineer and excel at your career; no-fluff, pure engineering.

System Design Masterclass

A masterclass that helps you become great at designing scalable, fault-tolerant, and highly available systems.

800+ learners

Details →

Designing Microservices

A free playlist to help you understand Microservices and their high-level patterns in depth.

17+ learners

Details →

GitHub Outage Dissections

A free playlist to help you learn core engineering from outages that happened at GitHub.

67+ learners

Details →

Hash Table Internals

A free playlist to help you understand the internal workings and construction of Hash Tables.

25+ learners

Details →

BitTorrent Internals

A free playlist to help you understand the algorithms and strategies that power P2P networks and BitTorrent.

42+ learners

Details →

Topics I talk about

Being a passionate engineer, I love to talk about a wide range of topics, but these are my personal favourites.

Arpit's Newsletter read by 17000+ engineers

🔥 Thrice a week, in your inbox, an essay about system design, distributed systems, microservices, programming languages internals, or a deep dive on some super-clever algorithm, or just a few tips on building highly scalable distributed systems.

  • v12.7.8
  • © Arpit Bhayani, 2022

Powered by this tech stack.