How to make underlying cache methods available in timed LRU cache in Python?
We know that cache can greatly speed up a load of frequently used data (for example, S3 objects). Python 3 has a built-in implementation of simple unbound and LRU caches in functools module called cache and lru_cache respectively (functools documentation). Both cache and lru_cache expose some useful methods like cache_clear or cache_info . Clearing cache is especially important between your code tests because you can get strange results (a good article that describes this problem).
If you expect your data to change during the program execution, it makes sense to implement a timed cache. Unfortunately, Python 3 doesn’t provide any built-in functionality for timed cache so we need to implement it by ourselves.
Searching on the Internet, you can find a lot of different implementations but all of them are pretty much the same. Let’s take an example from great article about cache by the Real Python website.
Implementation of the timed cache from the article:
from functools import lru_cache, wraps from datetime import datetime, timedelta def timed_lru_cache(seconds: int, maxsize: int = 128): def wrapper_cache(func): func = lru_cache(maxsize=maxsize)(func) func.lifetime = timedelta(seconds=seconds) func.expiration = datetime.utcnow() + func.lifetime @wraps(func) def wrapped_func(*args, **kwargs): if datetime.utcnow() >= func.expiration: func.cache_clear() func.expiration = datetime.utcnow() + func.lifetime return func(*args, **kwargs) return wrapped_func return wrapper_cache
Note: a cache is cleared only when the timed_lru_cache function is called and the condition is checked. Do not expect cache to be cleared right on the expiration time.
@timed_lru_cache() def load_key_from_s3(key: str): # .
Warning: brackets are important. @timed_lru_cache will not work, use @timed_lru_cache()
But there is an issue, the wraps method from functools is only preserving the original function’s name and docstring but not original methods (documentation on wraps). Calling, for example, load_key_from_s3.cache_clear() will fail. What if we want to expose missing methods for test and statistics purposes? The simplest fix to the above implementation is the following:
def timed_lru_cache(seconds: int, maxsize: int = 128): def wrapper_cache(func): func = lru_cache(maxsize=maxsize)(func) func.lifetime = timedelta(seconds=seconds) func.expiration = datetime.utcnow() + func.lifetime @wraps(func) def wrapped_func(*args, **kwargs): if datetime.utcnow() >= func.expiration: func.cache_clear() func.expiration = datetime.utcnow() + func.lifetime return func(*args, **kwargs) # add missing methods to wrapped function wrapped_func.cache_clear = func.cache_clear wrapped_func.cache_info = func.cache_info return wrapped_func return wrapper_cache
Now both cache_clear and cache_info methods are exposed for load_key_from_s3 function. Thank you for reading.
Morreski / timed_cache.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
from datetime import datetime , timedelta |
import functools |
def timed_cache ( ** timedelta_kwargs ): |
def _wrapper ( f ): |
update_delta = timedelta ( ** timedelta_kwargs ) |
next_update = datetime . utcnow () + update_delta |
# Apply @lru_cache to f with no cache size limit |
f = functools . lru_cache ( None )( f ) |
@ functools . wraps ( f ) |
def _wrapped ( * args , ** kwargs ): |
nonlocal next_update |
now = datetime . utcnow () |
if now >= next_update : |
f . cache_clear () |
next_update = now + update_delta |
return f ( * args , ** kwargs ) |
return _wrapped |
return _wrapper |
from datetime import timedelta,datetime import functools
Great pìece of code. Thanks!!
Thanks for your feedback ! And for mentionning the imports. 🙂
So simple yet so useful! Thanks @Morreski! 👍
Thanks @Morreski! Take a look at this modification to support passing arguments to the underlying lru_cache method: https://gist.github.com/jmdacruz/764bcaa092eefc369a8bfb90c5fe3227
Add support lru_cache of maxsize and typed .
from datetime import datetime, timedelta import functools def timed_cache(**timed_cache_kwargs): def _wrapper(f): maxsize = timed_cache_kwargs.pop('maxsize', 128) typed = timed_cache_kwargs.pop('typed', False) update_delta = timedelta(**timed_cache_kwargs) next_update = datetime.utcnow() - update_delta f = functools.lru_cache(maxsize=maxsize, typed=False)(f) @functools.wraps(f) def _wrapped(*args, **kwargs): nonlocal next_update now = datetime.utcnow() if now >= next_update: f.cache_clear() next_update = now + update_delta return f(*args, **kwargs) return _wrapped return _wrapper
I think it should be next_update = datetime.utcnow() + update_delta but in fact it does not change the correctness of the solution since if will force a flush on the first call. It’s just not needed and if copy pasted to another context it could be wrong.
from datetime import datetime, timedelta import functools def timed_cache(**timedelta_kwargs): def _wrapper(f): update_delta = timedelta(**timedelta_kwargs) next_update = datetime.utcnow() + update_delta # Apply @lru_cache to f with no cache size limit f = functools.lru_cache(None)(f) @functools.wraps(f) def _wrapped(*args, **kwargs): nonlocal next_update now = datetime.utcnow() if now >= next_update: f.cache_clear() next_update = now + update_delta return f(*args, **kwargs) return _wrapped return _wrapper
f = functools.lru_cache(maxsize=maxsize, typed=False)(f)
There should be typed=typed instead of typed=False
In general, nice piece of code but what’s the point to clear whole cache after timeout? To me, timeout should be applied to individual results.
f = functools.lru_cache(maxsize=maxsize, typed=False)(f)
There should be typed=typed instead of typed=False
In general, nice piece of code but what’s the point to clear whole cache after timeout? To me, timeout should be applied to individual results.
I agree, I was hoping for a recipe for a per-element expiration, this example is far too heavy-handed, as it clears the ENTIRE cache if any individual element is outdated.
@Spaider @linclelinkpart5
Here is a version that supports per-element expiration.
Since the official «lru_cache» doesn’t offer api to remove specific element from cache, I have to re-implement it. Most of the code are just from the original «lru_cache», except the parts for expiration and the class «Node» to implement linked list. (The official version implements
linked list with array)
Thank you for this! I used it in a project where we have 100% test coverage so I wrote this simple test for it.
Thought it could be useful for others as well.
import unittest class Testing(unittest.TestCase): def test_timed_cache(self): """Test the timed_cache decorator.""" from python_file import timed_cache import logging import time cache_logger = logging.getLogger("foo_log") @timed_cache(seconds=1) def cache_testing_function(num1, num2): cache_logger.info("Not cached yet.") return num1 + num2 with self.assertLogs("foo_log", level="INFO") as cache_log: result1 = cache_testing_function(2, 3) self.assertEqual(cache_log.output[0], "INFO:foo_log:Not cached yet.") assert result1 == 5 result2 = cache_testing_function(2, 3) assert len(cache_log.output) == 1 assert result2 == 5 time.sleep(1) result3 = cache_testing_function(2, 3) self.assertEqual(cache_log.output[1], "INFO:foo_log:Not cached yet.") assert result3 == 5
I think it should be next_update = datetime.utcnow() + update_delta but in fact it does not change the correctness of the solution since if will force a flush on the first call. It’s just not needed and if copy pasted to another context it could be wrong.
from datetime import datetime, timedelta import functools def timed_cache(**timedelta_kwargs): def _wrapper(f): update_delta = timedelta(**timedelta_kwargs) next_update = datetime.utcnow() + update_delta # Apply @lru_cache to f with no cache size limit f = functools.lru_cache(None)(f) @functools.wraps(f) def _wrapped(*args, **kwargs): nonlocal next_update now = datetime.utcnow() if now >= next_update: f.cache_clear() next_update = now + update_delta return f(*args, **kwargs) return _wrapped return _wrapper
Hi ! You’re 100% right. I updated the gist with your fixed version. Thanks !
Thanks for this! Very helpful.
I used this function in one of my projects but modified it a little bit before using it.
def cache(seconds: int, maxsize: int = 128, typed: bool = False): def wrapper_cache(func): func = functools.lru_cache(maxsize=maxsize, typed=typed)(func) func.delta = timedelta(seconds=seconds) func.expiration = datetime.utcnow() + func.delta @functools.wraps(func) def wrapped_func(*args, **kwargs): if datetime.utcnow() >= func.expiration: func.cache_clear() func.expiration = datetime.utcnow() + func.delta return func(*args, **kwargs) return wrapped_func return wrapper_cache
Here are some notes about this version:
- The @cache decorator simply expects the number of seconds instead of the full list of arguments expected by timedelta . This avoids leaking timedelta ‘s interface outside of the implementation of @cache . Having the number of seconds should be flexible enough to invalidate the cache at any interval.
- maxsize and typed can now be explicitly declared as part of the arguments expected by @cache .
- By adding the delta and expiration variables to the func we don’t have to use the nonlocal variables, which makes for more readable and compact code.
Also, here is a pytest test case:
def test_cache(): count = 0 @cache(seconds=1) def test(arg1): nonlocal count count += 1 return count assert test(1) == 1, "Function should be called the first time we invoke it" assert test(1) == 1, "Function should not be called because it is already cached" # Let's now wait for the cache to expire time.sleep(1) assert test(1) == 2, "Function should be called because the cache already expired"
Thanks for this! Very helpful.
I used this function in one of my projects but modified it a little bit before using it.
def cache(seconds: int, maxsize: int = 128, typed: bool = False): def wrapper_cache(func): func = functools.lru_cache(maxsize=maxsize, typed=typed)(func) func.delta = timedelta(seconds=seconds) func.expiration = datetime.utcnow() + func.delta @functools.wraps(func) def wrapped_func(*args, **kwargs): if datetime.utcnow() >= func.expiration: func.cache_clear() func.expiration = datetime.utcnow() + func.delta return func(*args, **kwargs) return wrapped_func return wrapper_cache
- The @cache decorator simply expects the number of seconds instead of the full list of arguments expected by timedelta . This avoids leaking timedelta ‘s interface outside of the implementation of @cache . Having the number of seconds should be flexible enough to invalidate the cache at any interval.
- maxsize and typed can now be explicitly declared as part of the arguments expected by @cache .
- By adding the delta and expiration variables to the func we don’t have to use the nonlocal variables, which makes for more readable and compact code.
def test_cache(): count = 0 @cache(seconds=1) def test(arg1): nonlocal count count += 1 return count assert test(1) == 1, "Function should be called the first time we invoke it" assert test(1) == 1, "Function should not be called because it is already cached" # Let's now wait for the cache to expire time.sleep(1) assert test(1) == 2, "Function should be called because the cache already expired"
Thanks your share, it’s very good!
I add some test and info about test_cache for some people’s doubts.
def test_cache(): count = 0 count2 = 0 @cache(seconds=1) def test(arg1): nonlocal count count += 1 return count @cache(seconds=10) def test_another(arg2): nonlocal count2 count2 += 1 return count2 assert test(1) == 1, "Function test with arg 1 should be called the first time we invoke it" assert test(1) == 1, "Function test with arg 1 should not be called because it is already cached" assert test(-1) == 2, "Function test with arg -1 should be called the first time we invoke it" assert test(-1) == 2, "Function test with arg -1 should not be called because it is already cached" assert test_another(1) == 1, "Function test_another with arg 1 should be called the first time we invoke it" assert test_another(1) == 1, "Function test_another with arg 1 should not be called because it is already cached" # Let's now wait for the cache to expire time.sleep(1) assert test(1) == 3, "Function test with arg 1 should be called because the cache already expired" assert test(-1) == 4, "Function test with arg -1 should be called because the cache already expired" # func.cache_clear clear func's cache, not all lru cache assert test_another(1) == 1, "Function test_another with arg 1 should not be called because the cache NOT expired yet"