c++,

How to write an LRU cache: Master it in any programming language

TechMunching TechMunching Follow Feb 20, 2021 · 8 mins read
Share this

From an algorithm, data structure and interview’s perspective, caching makes for a versatile topic. It can be used to gauge someone’s low-level understanding of data structures and algorithms. It can also be turned around to challenge one’s high-level comprehension of distributed systems. Among caches, LRU is perhaps most widely used and accepted. We promise, after reading this post, you should be able to write one in your sleep.

Cache in your wardrobe


Table of Contents

  1. Introduction, what’s a cache really?
  2. What’s a Least Recently Used (LRU) Cache?
  3. Applications in System Design
  4. How to Build it in any programming language
  5. Building blocks
  6. Putting it all together

Introduction

Caches are a type of data storage device that typically stores data in memory for fast retrieval. They are generally implemented as a key-value store, meaning you store and access the data via an identifying key of some sort. The RAM on your computer is an example of a cache.

The operating system stores data in RAM for faster access than the hard drive, and it uses the address of the memory cell as the key. Since it’s expensive, its usage has to be highly optimized!

One such way is to use the Least Recently Used (LRU) Cache Algorithm also known as one of the Page Replacement Algorithms. It maintains the things that need to be stored in cache and which are to be thrown out.

For e.g. -

  1. A processor cache is used to read data faster from main memory (RAM).
  2. Cache in RAM can be used to store part of disk data in RAM and serve future requests faster.
  3. Network responses can be cached in RAM to avoid too many network calls.

What’s a Least Recently Used (LRU) Cache?

LRU caches are a specific type of cache with a unique feature. When an LRU cache runs out of space and needs to remove data, it will evict the key-value pair that was least recently fetched from the cache, thus it defines the policy to evict elements from the cache to make room for new elements when the cache becomes full.

Let’s understand this with an example:

When you are working on a part of a coding project, you would be opening some specific files, say like project.cpp, project1.cpp, again and again. This means if you were earlier playing some game, the files of that game are no longer used frequently.

File System Cache

File System Caching

LRU Cache gives priority to those files which are used more frequently. The files that are used rarely are removed and those which are used frequently are stored. The least recently used file is removed when any new file is used.

Popular open source examples of LRU caches are Redis and Memcached.


Applications in System Design

Slow or under-performing websites can result in millions in lost revenue, Querying databases, especially when they contain a lot of data, can be quite slow.

Unfortunately, many of the same systems that rely on high uptime also have to store mountains of data. This is often the case for social media and e-commerce sites. These sites store their data in a database of some sort, be it SQL or NoSQL. While this is standard practice, the problem comes when you need to fetch that data.

Enter the cache…

Since caches keep data in memory, they are much more performant than traditional databases. And for social media sites, where 20% of the most popular content drives 80% of the traffic, caching can dramatically decrease the load on the databases.

SQL vs. cache speed comparison

SQL vs. cache speed comparison (Image Credit: Dzone)

The next time an interviewer asks you how to optimize an API endpoint or workflow that requires fetching the same data over and over, caching is a good place to start. Knowing how to build one yourself is the hard part and that’s exactly what we’ll be focusing on.


How to Build it in any programming language

Requirements ??

The first will be the API. Which methods do we need to implement? While production quality caches are feature rich, we’ll keep it simple. You’ll only need to create two methods:

  1. get(key)
  2. set(key, value)

The LRU cache itself has following functional requirements:

  1. When the max size is reached, remove the least recently used key.
  2. Whenever a key is fetched / updated, it is most recently used.
  3. Both get and set operations must complete in O(1) time complexity

No matter how large the cache is, it takes the same amount of time to complete

With these requirements in mind, we can get to work.

Data structure

Cache and O(1) means we need a map / dictionary kind of thing but that isn’t all. To keep track of order we must have order as well so we would have to a list kind of a thing as well. Thus, to implement an LRU cache we use two data structures: a hashmap and a doubly linked list. A doubly linked list helps in maintaining the eviction order and a hashmap helps with O(1) lookup of cached keys. Here goes the algorithm for LRU cache.

Pseudocode for LRU cache

If the element exists in hashmap
    move the element to the the start

Else
    if cache is already full
        Remove the last element from list and delete its hashmap entry
    Add the new element at the start of list and in hashmap as well

Get from Cache and Return

Note that list is used to keep track of the most recently accessed elements. The element at the start here of the doubly linked list is the most recently accessed element.

LRU Cache Sample iterations

LRU Cache Sample iterations


Building blocks

  1. Doubly Linked List => For C++ code, we’ll use std::<list>
  2. Hashmap => For C++ code, we’ll use std::<unordered_map>
  3. API to handle cache size overflow
  4. API to handle move element to start
  5. APIs to Get and Set the actual data.

Putting it all together

Here’s the final code for LRU cache. First let’s just write the class structure -

Block 1 & 2: A class that uses Hashmap, Doubly Linked List


#include <iostream>
#include <list>
#include <unordered_map>

template < typename keyT, typename valueT >
class LRUCache {
    public:
      // typedef complicated datatypes for ease of reading the code
    using Pair = std::pair<keyT, valueT>;
    using ListIterator = std::list<std::pair<keyT, valueT>>::iterator;

    LRUCache(size_t maxSize) noexcept: _maxSize(maxSize) {}

    public:
    // .... Methods

    private:

    // ... Methods

    std::list < Pair > _keyValuePairList;
    std::unordered_map < keyT, ListIterator > _cacheMap;
    size_t _maxSize;
};

Block 3: Handle cache size overflow and element movement

Let’s write some helper functions, that will go in private mode -


    private:

    void moveToStart(ListIterator & iterator) {
      _keyValuePairList.splice(_keyValuePairList.begin(), _keyValuePairList, iterator);
    }

    void checkCacheSize() {
      if (_maxSize > 0 && _keyValuePairList.size() > _maxSize) {
        _cacheMap.erase(_keyValuePairList.back().first);
        _keyValuePairList.pop_back();
      }
    }

Block 4a: Set API for key and value

Now, let’s implement the public APIs for this class, first one would the set -


    // move version of setting key, value. We can use emplace_front to directly
    // transfer these objects to the list.
    void set(keyT && key, valueT && value) {
      if (_cacheMap.find(key) != _cacheMap.end()) {
        // if the key already exist then update the value in the list using
        // iterator from map and move the key, value pair to front to delay
        // eviction.
        _cacheMap[key] -> second = value;
        moveToStart(_cacheMap[key]);
      } else {
        // if this is a new key then add it to the map and also to the list.
        _keyValuePairList.emplace_front(key, value);
        _cacheMap[key] = _keyValuePairList.begin();
      }
      // check for size of cache.
      checkCacheSize();
    }

    void set(const keyT & key, const valueT & value) {
      if (_cacheMap.find(key) != _cacheMap.end()) {
        _cacheMap[key] -> second = value;
        moveToStart(_cacheMap[key]);
      } else {
        _keyValuePairList.push_front({
          key,
          value
        });
        _cacheMap[key] = _keyValuePairList.begin();
      }
      checkCacheSize();
    }

Block 4b: Get API and element checking


    const valueT & get(const keyT & key) {
      if (!exists(key))
        throw std::range_error("Key not present");
      // Move this item to the front of the list to delay eviction.
      moveToStart(_cacheMap[key]);
      return _cacheMap[key] -> second;
    }

    bool exists(const keyT & key) const {
      return _cacheMap.find(key) != _cacheMap.end();
    }

That’s it, we’re done. Yay!! We hope you found this deep dive into LRU caches useful and informative, now you’ll be able to build one in your sleep.


References:

  1. https://dzone.com/articles/redis-vs-mysql-benchmarks
  2. https://course.ece.cmu.edu/~ece447/s15/lib/exe/fetch.php?media=onur-447-spring15-lecture18-caches-afterlecture.pdf
  3. https://stackoverflow.com/questions/2504178/lru-cache-design

Thanks for reading this article! Feel free to leave your comments and let me know what you think. Please feel free to drop any comments to improve this article.

Please check out our other articles and website, Have a great day!

Join Newsletter
Get the latest news right in your inbox. We never spam!

TechMunching
Written by TechMunching Follow
Hi, We are Tech Munching. We hope you love it!