LRU缓存机制与 LeetCode 146

最近最少使用算法，或 最近最久未使用算法 （Least Recently Used, LRU）是一种常用的页面置换算法（当然也可以用于文件缓存等情景）。再实际情况中，如某些页面长时间未被访问，则它们在将来还可能会长时间不会访问。算法基于这个推论，选择最近最久未使用的页面予以淘汰。

我们认为 “最近使用” 指的是读/写相应的内存页（或值），然后以一个例子（来自参考资料1）看看：

程序按时间顺序访问内存页 $c, a, d, b, e, b, a, b, c, d$ ，1 ~ 4 时刻内存页均能命中，当访问到 $e$ 时引发缺页中断。因为加载的页已满，需要将内存中的某个页面置换出去。这时候，检索已存在内存中的页面 $a, b, c, d$ 的最后访问时间， $c$ 为最近最少使用的页，于是换入页面 $e$ 来代替 $c$ ，然后程序继续执行。

实际系统中由于 LRU 算法所需开销太大，一般使用它的近似算法，并且在操作系统虚拟存储管理中，使用硬件做辅助。

程序设计

LeetCode 146 对这个问题做了简化：

运用你所掌握的数据结构，设计和实现一个 LRU (最近最少使用) 缓存机制。它应该支持以下操作：获取数据 get 和写入数据 put 。

获取数据 get(key) - 如果密钥 (key) 存在于缓存中，则获取密钥的值（总是正数），否则返回 -1。
写入数据 put(key, value) - 如果密钥已经存在，则变更其数据值；如果密钥不存在，则插入该组「密钥/数据值」。当缓存容量达到上限时，它应该在写入新数据之前删除最久未使用的数据值，从而为新的数据值留出空间。

略加思索，我们考虑这样几种模拟实现。

实现一

一个很自然的想法是，使用一个数组存储页表，并页表中的每一页添加最近使用时间 last_use 属性，“使用”（读 / 写）页时更新它的值，如：

if (content[i].key == key)
{
    content[i].last_use = (++current_time);
    return content[i].value;
}

当容量满时，只要找到最小 last_use 的对应项并删除即可。这样设计时，对页面的读、写和置换时算法复杂度都将是 $O(n)$ 。

一个改进策略是，使用一个不那么严格的栈，保证栈中各个元素 last_use 有序，可以减少置换时的耗时。

实现二

为了进一步优化访问时间，可以采用一个链表加快页面置换的速度。不妨将链表的头作为最近访问的页面，那么链表尾部就是最久未使用的页。访问内存时，找到相应页面并移到首部，缺页时，置换尾节点的页面。

该方法查表时间为 $O(n)$ ，最坏情况即缺页时，但是淘汰旧页面的速度为 $O(1)$ ，而上一个思路为 $O(n)$ 。

实现三

LeetCode 146 上普遍采用使用哈希表 + 双向链表的思路：我们用哈希表将 key 映射到双向链表中的对应节点，因为采用了双向链表，可以无需遍历而移动任何一个节点，因此实现修改和查询均为 $O(1)$ 的时间复杂度。（图源水印，参考资料2，侵删）

代码如下：

// Use a linked list to store <key, value>
std::list<std::pair<int, int>>  table_;
// unordered_map for index_
std::unordered_map<int, std::list<std::pair<int, int>>::iterator>  index_;

在实现中，table_ 的单个元素是 std::pair<int, int> 类型。前一个 int 存储 key，后一个 int 存储 value，这是为了方便淘汰旧表项时，可以快速地找到并在 index_ 中同步删除 key。

get 函数如下：

int get(int key)
{
    /* Looking for the value and push it to the end of the list. */
    auto  map_iter = index_.find(key);
    if (map_iter != index_.end())  // found
    {
        auto  table_iter = map_iter->second;
        std::pair<int, int>  kv_pair = *table_iter;
    
        // Push the value to the end of the table_.
        table_.erase(table_iter);
        table_.push_back(kv_pair);
        index_[key] = std::prev(table_.end());
        return kv_pair.second;
    }
    return -1;
}

put函数如下：

void put(int key, int value)
{
    const auto map_iter = index_.find(key);
    // If the value is existed in map.
    if (map_iter != index_.end())
    {
        table_.erase(map_iter->second);
        table_.emplace_back(std::pair<int, int>(key, value));
        // Update map.
        index_[key] = std::prev(table_.end());
    }
    else  // Not found.
    {
        // If cache is full, die out the oldest one
        if (capacity == table_.size())
        {
            int  key_to_remove = table_.front().first;
            index_.erase(key_to_remove);
            table_.pop_front();
        }
        table_.emplace_back(std::pair<int, int>(key, value));
        index_[key] = std::prev(table_.end());
    }
}

这里，参考资料2（labuladong）的实现中，最近访问的元素放到了 std::list 的头部，这样再将其写入索引 index_ 时可以直接存储迭代器 table_.begin()，而不用像我例子中 std::prev(table_.end())，算是一个小的优化吧。

完整代码： lru.cpp

参考资料

第七讲虚拟存储：局部页面置换算法, 清华大学操作系统课程主页, https://os.cs.tsinghua.edu.cn/oscourse/OS2020spring/lecture07
LRU 策略详解与实现（LeetCode 题解）, labuladong, https://leetcode-cn.com/problems/lru-cache/solution/lru-ce-lue-xiang-jie-he-shi-xian-by-labuladong/