Top Related Projects
Redis is an in-memory database that persists on disk. The data model is key-value, but many different kind of values are supported: Strings, Lists, Sets, Sorted Sets, Hashes, Streams, HyperLogLogs, Bitmaps.
memcached development tree
Disque is a distributed message broker
A generic dynamo implementation for different k-v storage engines
A fast, light-weight proxy for memcached and redis
Distributed reliable key-value store for the most critical data of a distributed system
Quick Overview
Pottery is a Redis-like in-memory data structure store for Python. It provides a simple and efficient way to work with various data structures such as dictionaries, lists, and sets, with optional persistence to disk. Pottery aims to offer a Redis-like experience within Python applications without the need for a separate Redis server.
Pros
- Easy to use and integrate into existing Python projects
- Provides Redis-like functionality without requiring a separate server
- Supports optional persistence to disk for data durability
- Implements familiar Redis data structures and commands
Cons
- Limited to single-process usage, unlike Redis which supports distributed systems
- May not be suitable for large-scale applications with high concurrency requirements
- Lacks some advanced Redis features like pub/sub and transactions
- Performance may not match that of a dedicated Redis server for very large datasets
Code Examples
- Creating and using a Pottery dict:
from pottery import RedisDict
# Create a RedisDict
my_dict = RedisDict(redis_url="redis://localhost:6379/0", key="my_dict")
# Use it like a regular Python dict
my_dict["key"] = "value"
print(my_dict["key"]) # Output: value
# Persist changes to Redis
my_dict.sync()
- Working with a Pottery list:
from pottery import RedisList
# Create a RedisList
my_list = RedisList(redis_url="redis://localhost:6379/0", key="my_list")
# Append items to the list
my_list.append("item1")
my_list.append("item2")
# Access items by index
print(my_list[0]) # Output: item1
# Get the length of the list
print(len(my_list)) # Output: 2
- Using a Pottery set:
from pottery import RedisSet
# Create a RedisSet
my_set = RedisSet(redis_url="redis://localhost:6379/0", key="my_set")
# Add items to the set
my_set.add("apple")
my_set.add("banana")
my_set.add("apple") # Duplicate, won't be added
# Check membership
print("apple" in my_set) # Output: True
# Get the number of items in the set
print(len(my_set)) # Output: 2
Getting Started
To get started with Pottery, follow these steps:
-
Install Pottery using pip:
pip install pottery
-
Import the desired data structure from Pottery in your Python code:
from pottery import RedisDict, RedisList, RedisSet
-
Create an instance of the data structure, specifying the Redis URL and key:
my_dict = RedisDict(redis_url="redis://localhost:6379/0", key="my_dict")
-
Use the data structure as you would with regular Python objects, and call
sync()
to persist changes to Redis when needed:my_dict["key"] = "value" my_dict.sync()
Competitor Comparisons
Redis is an in-memory database that persists on disk. The data model is key-value, but many different kind of values are supported: Strings, Lists, Sets, Sorted Sets, Hashes, Streams, HyperLogLogs, Bitmaps.
Pros of Redis
- Mature, battle-tested, and widely adopted in-memory data store
- Supports complex data structures and operations beyond simple key-value storage
- Extensive ecosystem with many client libraries and tools
Cons of Redis
- Requires separate server setup and management
- Higher resource consumption, especially for small-scale applications
- Steeper learning curve for advanced features and configurations
Code Comparison
Redis (C):
typedef struct redisObject {
unsigned type:4;
unsigned encoding:4;
unsigned lru:LRU_BITS;
int refcount;
void *ptr;
} robj;
Pottery (Python):
class Cache:
def __init__(self, ttl=None):
self.ttl = ttl
self._cache = {}
self._expiration = {}
Redis uses a C struct for its core object representation, while Pottery implements a simple Python class for caching. Redis offers more low-level control and performance optimizations, whereas Pottery provides a more straightforward, Pythonic interface.
Redis is better suited for large-scale, distributed systems with complex data requirements, while Pottery is ideal for simpler, Python-centric applications that need a lightweight caching solution without the overhead of a separate server.
memcached development tree
Pros of Memcached
- Highly scalable and distributed caching system
- Widely adopted and battle-tested in production environments
- Supports multiple programming languages through client libraries
Cons of Memcached
- Limited data structure support (mainly key-value pairs)
- Lacks built-in persistence mechanisms
- Requires separate server setup and maintenance
Code Comparison
Memcached (C):
memcached_return_t memcached_set(memcached_st *ptr,
const char *key,
size_t key_length,
const char *value,
size_t value_length,
time_t expiration,
uint32_t flags);
Pottery (Python):
from pottery import RedisDict
cache = RedisDict(redis=redis_client, key='my_cache')
cache['key'] = 'value'
Pottery offers a more Pythonic interface, leveraging Redis as its backend. It provides dictionary-like operations and supports various data structures. Memcached, on the other hand, offers a lower-level API but with broader language support and established performance in large-scale deployments. While Memcached excels in distributed caching scenarios, Pottery simplifies Redis usage for Python developers, making it easier to integrate caching into their applications.
Disque is a distributed message broker
Pros of Disque
- Designed specifically for distributed job queues, offering specialized features
- Developed by the creator of Redis, leveraging expertise in distributed systems
- Supports advanced queue operations like delayed jobs and job replication
Cons of Disque
- More complex setup and configuration compared to Pottery
- Less actively maintained, with fewer recent updates
- Requires a separate server infrastructure, increasing operational overhead
Code Comparison
Disque job creation:
ADDJOB queue_name "job_data" 0 REPLICATE 3
Pottery job creation:
queue.put("job_data")
Summary
Disque is a specialized distributed job queue system, offering advanced features but requiring more complex setup. Pottery, on the other hand, provides a simpler Python-based solution for Redis-backed queues. Disque may be better suited for large-scale distributed systems with specific job queue requirements, while Pottery offers an easier integration for Python projects needing basic queue functionality.
A generic dynamo implementation for different k-v storage engines
Pros of Dynomite
- Designed for large-scale distributed systems, offering high availability and fault tolerance
- Supports multiple storage engines (Redis, Memcached) and provides cross-datacenter replication
- Backed by Netflix, ensuring enterprise-level support and ongoing development
Cons of Dynomite
- More complex setup and configuration compared to Pottery
- Steeper learning curve due to its distributed nature and advanced features
- May be overkill for smaller applications or simpler use cases
Code Comparison
Pottery (Python):
from pottery import Redlock
with Redlock(key='my-lock', masters=[{'host': 'localhost', 'port': 6379}]):
# Critical section code here
Dynomite (C):
struct node_info node;
node.hostname = "localhost";
node.port = 8102;
node.seeds = NULL;
dyn_init(&node);
// Additional configuration and usage code
Summary
Dynomite is a robust, distributed system designed for large-scale applications, while Pottery offers a simpler, Python-focused approach to Redis-based locking. Dynomite provides more advanced features but requires more setup, whereas Pottery is easier to use for basic Redis operations in Python environments.
A fast, light-weight proxy for memcached and redis
Pros of twemproxy
- Designed for high-performance, large-scale Redis and Memcached deployments
- Supports multiple hashing modes for consistent distribution
- Provides connection pooling and pipelining for improved efficiency
Cons of twemproxy
- Limited to Redis and Memcached protocols
- Less active development and maintenance compared to Pottery
- Requires additional setup and configuration for deployment
Code Comparison
twemproxy configuration example:
alpha:
listen: 127.0.0.1:22121
hash: fnv1a_64
distribution: ketama
auto_eject_hosts: true
redis: true
server_retry_timeout: 2000
server_failure_limit: 1
servers:
- 127.0.0.1:6379:1
Pottery usage example:
from pottery import RedisDict
redis_dict = RedisDict(redis=redis_client, key='my_dict')
redis_dict['key'] = 'value'
print(redis_dict['key']) # Output: 'value'
While twemproxy focuses on proxy-level optimizations for Redis and Memcached, Pottery provides high-level Redis data structures in Python. twemproxy is better suited for large-scale deployments requiring load balancing, while Pottery offers a more developer-friendly interface for Redis interactions within Python applications.
Distributed reliable key-value store for the most critical data of a distributed system
Pros of etcd
- Mature, battle-tested distributed key-value store with strong consistency
- Supports advanced features like watch, lease, and transactions
- Widely adopted in production environments, especially in Kubernetes
Cons of etcd
- More complex to set up and maintain
- Heavier resource requirements
- Steeper learning curve for developers
Code Comparison
etcd (Go):
cli, _ := clientv3.New(clientv3.Config{Endpoints: []string{"localhost:2379"}})
defer cli.Close()
ctx, cancel := context.WithTimeout(context.Background(), time.Second)
_, err := cli.Put(ctx, "key", "value")
cancel()
Pottery (Python):
redis = Redis(host='localhost', port=6379, db=0)
pottery = RedisDict(redis=redis, key='my_dict')
pottery['key'] = 'value'
Summary
etcd is a robust distributed key-value store designed for high availability and consistency, making it suitable for complex distributed systems. Pottery, on the other hand, is a lightweight Redis-based dictionary implementation in Python, offering simplicity and ease of use for smaller-scale applications or those already using Redis.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
Pottery: Redis for Humans ððð
Redis is awesome, but Redis commands are not always intuitive. Pottery is a Pythonic way to access Redis. If you know how to use Python dicts, then you already know how to use Pottery. Pottery is useful for accessing Redis more easily, and also for implementing microservice resilience patterns; and it has been battle tested in production at scale.
Table of Contents
- Dicts ð
- Sets ðï¸
- Lists â
- Counters ð§®
- Deques ðï¸
- Queues ð¶ââï¸ð¶ââï¸ð¶ââï¸
- Redlock ð
- AIORedlock ð
- NextID ð¢
- redis_cache()
- CachedOrderedDict
- Bloom filters ð¸
- HyperLogLogs ðªµ
- ContextTimer â±ï¸
Installation
$ pip3 install pottery
Usage
First, set up your Redis client:
>>> from redis import Redis
>>> redis = Redis.from_url('redis://localhost:6379/1')
>>>
Dicts ð
RedisDict
is a Redis-backed container compatible with Python’s
dict
.
Here is a small example using a RedisDict
:
>>> from pottery import RedisDict
>>> tel = RedisDict({'jack': 4098, 'sape': 4139}, redis=redis, key='tel')
>>> tel['guido'] = 4127
>>> tel
RedisDict{'jack': 4098, 'sape': 4139, 'guido': 4127}
>>> tel['jack']
4098
>>> del tel['sape']
>>> tel['irv'] = 4127
>>> tel
RedisDict{'jack': 4098, 'guido': 4127, 'irv': 4127}
>>> list(tel)
['jack', 'guido', 'irv']
>>> sorted(tel)
['guido', 'irv', 'jack']
>>> 'guido' in tel
True
>>> 'jack' not in tel
False
>>>
Notice the first two keyword arguments to RedisDict()
: The first is your
Redis client. The second is the Redis key name for your dict. Other than
that, you can use your RedisDict
the same way that you use any other Python
dict
.
Limitations:
- Keys and values must be JSON serializable.
Sets ðï¸
RedisSet
is a Redis-backed container compatible with Python’s
set
.
Here is a brief demonstration:
>>> from pottery import RedisSet
>>> basket = RedisSet({'apple', 'orange', 'apple', 'pear', 'orange', 'banana'}, redis=redis, key='basket')
>>> sorted(basket)
['apple', 'banana', 'orange', 'pear']
>>> 'orange' in basket
True
>>> 'crabgrass' in basket
False
>>> a = RedisSet('abracadabra', redis=redis, key='magic')
>>> b = set('alacazam')
>>> sorted(a)
['a', 'b', 'c', 'd', 'r']
>>> sorted(a - b)
['b', 'd', 'r']
>>> sorted(a | b)
['a', 'b', 'c', 'd', 'l', 'm', 'r', 'z']
>>> sorted(a & b)
['a', 'c']
>>> sorted(a ^ b)
['b', 'd', 'l', 'm', 'r', 'z']
>>>
Notice the two keyword arguments to RedisSet()
: The first is your Redis
client. The second is the Redis key name for your set. Other than that, you
can use your RedisSet
the same way that you use any other Python set
.
Do more efficient membership testing for multiple elements using
.contains_many()
:
>>> nirvana = RedisSet({'kurt', 'krist', 'dave'}, redis=redis, key='nirvana')
>>> tuple(nirvana.contains_many('kurt', 'krist', 'chat', 'dave'))
(True, True, False, True)
>>>
Limitations:
- Elements must be JSON serializable.
Lists â
RedisList
is a Redis-backed container compatible with Python’s
list
.
>>> from pottery import RedisList
>>> squares = RedisList([1, 4, 9, 16, 25], redis=redis, key='squares')
>>> squares
RedisList[1, 4, 9, 16, 25]
>>> squares[0]
1
>>> squares[-1]
25
>>> squares[-3:]
[9, 16, 25]
>>> squares[:]
[1, 4, 9, 16, 25]
>>> squares + [36, 49, 64, 81, 100]
RedisList[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]
>>>
Notice the two keyword arguments to RedisList()
: The first is your Redis
client. The second is the Redis key name for your list. Other than that, you
can use your RedisList
the same way that you use any other Python list
.
Limitations:
- Elements must be JSON serializable.
- Under the hood, Python implements
list
using an array. Redis implements list using a doubly linked list. As such, inserting elements at the head or tail of aRedisList
is fast, O(1). However, accessingRedisList
elements by index is slow, O(n). So in terms of performance and ideal use cases,RedisList
is more similar to Python’sdeque
than Python’slist
. Instead ofRedisList
, consider usingRedisDeque
.
Counters ð§®
RedisCounter
is a Redis-backed container compatible with Python’s
collections.Counter
.
>>> from pottery import RedisCounter
>>> c = RedisCounter(redis=redis, key='my-counter')
>>> c = RedisCounter('gallahad', redis=redis, key='my-counter')
>>> c.clear()
>>> c = RedisCounter({'red': 4, 'blue': 2}, redis=redis, key='my-counter')
>>> c.clear()
>>> c = RedisCounter(redis=redis, key='my-counter', cats=4, dogs=8)
>>> c.clear()
>>> c = RedisCounter(['eggs', 'ham'], redis=redis, key='my-counter')
>>> c['bacon']
0
>>> c['sausage'] = 0
>>> del c['sausage']
>>> c.clear()
>>> c = RedisCounter(redis=redis, key='my-counter', a=4, b=2, c=0, d=-2)
>>> sorted(c.elements())
['a', 'a', 'a', 'a', 'b', 'b']
>>> c.clear()
>>> RedisCounter('abracadabra', redis=redis, key='my-counter').most_common(3)
[('a', 5), ('b', 2), ('r', 2)]
>>> c.clear()
>>> c = RedisCounter(redis=redis, key='my-counter', a=4, b=2, c=0, d=-2)
>>> from collections import Counter
>>> d = Counter(a=1, b=2, c=3, d=4)
>>> c.subtract(d)
>>> c
RedisCounter{'a': 3, 'b': 0, 'c': -3, 'd': -6}
>>>
Notice the first two keyword arguments to RedisCounter()
: The first is your
Redis client. The second is the Redis key name for your counter. Other than
that, you can use your RedisCounter
the same way that you use any other
Python Counter
.
Limitations:
- Keys must be JSON serializable.
Deques ðï¸
RedisDeque
is a Redis-backed container compatible with Python’s
collections.deque
.
Example:
>>> from pottery import RedisDeque
>>> d = RedisDeque('ghi', redis=redis, key='letters')
>>> for elem in d:
... print(elem.upper())
G
H
I
>>> d.append('j')
>>> d.appendleft('f')
>>> d
RedisDeque(['f', 'g', 'h', 'i', 'j'])
>>> d.pop()
'j'
>>> d.popleft()
'f'
>>> list(d)
['g', 'h', 'i']
>>> d[0]
'g'
>>> d[-1]
'i'
>>> list(reversed(d))
['i', 'h', 'g']
>>> 'h' in d
True
>>> d.extend('jkl')
>>> d
RedisDeque(['g', 'h', 'i', 'j', 'k', 'l'])
>>> d.rotate(1)
>>> d
RedisDeque(['l', 'g', 'h', 'i', 'j', 'k'])
>>> d.rotate(-1)
>>> d
RedisDeque(['g', 'h', 'i', 'j', 'k', 'l'])
>>> RedisDeque(reversed(d), redis=redis)
RedisDeque(['l', 'k', 'j', 'i', 'h', 'g'])
>>> d.clear()
>>> d.extendleft('abc')
>>> d
RedisDeque(['c', 'b', 'a'])
>>>
Notice the two keyword arguments to RedisDeque()
: The first is your Redis
client. The second is the Redis key name for your deque. Other than that, you
can use your RedisDeque
the same way that you use any other Python deque
.
Limitations:
- Elements must be JSON serializable.
Queues ð¶ââï¸ð¶ââï¸ð¶ââï¸
RedisSimpleQueue
is a Redis-backed multi-producer, multi-consumer FIFO queue
compatible with Python’s
queue.SimpleQueue
.
In general, use a Python queue.Queue
if you’re using it in one or more
threads, use multiprocessing.Queue
if you’re using it between processes,
and use RedisSimpleQueue
if you’re sharing it across machines or if you
need for your queue to persist across application crashes or restarts.
Instantiate a RedisSimpleQueue
:
>>> from pottery import RedisSimpleQueue
>>> cars = RedisSimpleQueue(redis=redis, key='cars')
>>>
Notice the two keyword arguments to RedisSimpleQueue()
: The first is your
Redis client. The second is the Redis key name for your queue. Other than
that, you can use your RedisSimpleQueue
the same way that you use any other
Python queue.SimpleQueue
.
Check the queue state, put some items in the queue, and get those items back out:
>>> cars.empty()
True
>>> cars.qsize()
0
>>> cars.put('Jeep')
>>> cars.put('Honda')
>>> cars.put('Audi')
>>> cars.empty()
False
>>> cars.qsize()
3
>>> cars.get()
'Jeep'
>>> cars.get()
'Honda'
>>> cars.get()
'Audi'
>>> cars.empty()
True
>>> cars.qsize()
0
>>>
Limitations:
- Items must be JSON serializable.
Redlock ð
Redlock
is a safe and reliable lock to coordinate access to a resource shared
across threads, processes, and even machines, without a single point of
failure. Rationale and algorithm
description.
Redlock
implements Python’s excellent
threading.Lock
API as closely as is feasible. In other words, you can use Redlock
the same
way that you use threading.Lock
. The main reason to use Redlock
over
threading.Lock
is that Redlock
can coordinate access to a resource shared
across different machines; threading.Lock
can’t.
Instantiate a Redlock
:
>>> from pottery import Redlock
>>> printer_lock = Redlock(key='printer', masters={redis}, auto_release_time=.2)
>>>
The key
argument represents the resource, and the masters
argument
specifies your Redis masters across which to distribute the lock. In
production, you should have 5 Redis masters. This is to eliminate a single
point of failure — you can lose up to 2 out of the 5 Redis masters and
your Redlock
will remain available and performant. Now you can protect
access to your resource:
>>> if printer_lock.acquire():
... # Critical section - print stuff here.
... print('printer_lock is locked')
... printer_lock.release()
printer_lock is locked
>>> bool(printer_lock.locked())
False
>>>
Or you can protect access to your resource inside a context manager:
>>> with printer_lock:
... # Critical section - print stuff here.
... print('printer_lock is locked')
printer_lock is locked
>>> bool(printer_lock.locked())
False
>>>
It’s safest to instantiate a new Redlock
object every time you need to
protect your resource and to not share Redlock
instances across different
parts of code. In other words, think of the key
as identifying the resource;
don’t think of any particular Redlock
as identifying the resource.
Instantiating a new Redlock
every time you need a lock sidesteps bugs by
decoupling how you use Redlock
from the forking/threading model of your
application/service.
Redlock
s are automatically released (by default, after 10 seconds). You
should take care to ensure that your critical section completes well within
that timeout. The reasons that Redlock
s are automatically released are to
preserve
“liveness”
and to avoid deadlocks (in the event that a process dies inside a critical
section before it releases its lock).
>>> import time
>>> if printer_lock.acquire():
... # Critical section - print stuff here.
... time.sleep(printer_lock.auto_release_time)
>>> bool(printer_lock.locked())
False
>>>
If 10 seconds isn’t enough to complete executing your critical section, then you can specify your own auto release time (in seconds):
>>> printer_lock = Redlock(key='printer', masters={redis}, auto_release_time=.2)
>>> if printer_lock.acquire():
... # Critical section - print stuff here.
... time.sleep(printer_lock.auto_release_time / 2)
>>> bool(printer_lock.locked())
True
>>> time.sleep(printer_lock.auto_release_time / 2)
>>> bool(printer_lock.locked())
False
>>>
By default, .acquire()
blocks indefinitely until the lock is acquired. You
can make .acquire()
return immediately with the blocking
argument.
.acquire()
returns True
if the lock was acquired; False
if not.
>>> printer_lock_1 = Redlock(key='printer', masters={redis}, auto_release_time=.2)
>>> printer_lock_2 = Redlock(key='printer', masters={redis}, auto_release_time=.2)
>>> printer_lock_1.acquire(blocking=False)
True
>>> printer_lock_2.acquire(blocking=False) # Returns immediately.
False
>>> printer_lock_1.release()
>>>
You can make .acquire()
block but not indefinitely by specifying the
timeout
argument (in seconds):
>>> printer_lock_1.acquire()
True
>>> printer_lock_2.acquire(timeout=printer_lock_1.auto_release_time / 2) # Waits 100 milliseconds.
False
>>> import contextlib
>>> from pottery import ReleaseUnlockedLock
>>> with contextlib.suppress(ReleaseUnlockedLock):
... printer_lock_1.release()
>>>
You can similarly configure the Redlock context manager’s
blocking/timeout behavior during Redlock initialization. If the context
manager fails to acquire the lock, it raises the QuorumNotAchieved
exception.
>>> import contextlib
>>> from pottery import QuorumNotAchieved
>>> printer_lock_1 = Redlock(key='printer', masters={redis}, context_manager_blocking=True, context_manager_timeout=0.2)
>>> printer_lock_2 = Redlock(key='printer', masters={redis}, context_manager_blocking=True, context_manager_timeout=0.2)
>>> with printer_lock_1:
... with contextlib.suppress(QuorumNotAchieved):
... with printer_lock_2: # Waits 200 milliseconds; raises QuorumNotAchieved.
... pass
... print(f"printer_lock_1 is {'locked' if printer_lock_1.locked() else 'unlocked'}")
... print(f"printer_lock_2 is {'locked' if printer_lock_2.locked() else 'unlocked'}")
printer_lock_1 is locked
printer_lock_2 is unlocked
>>>
synchronize() ð¯ââï¸
synchronize()
is a decorator that allows only one thread to execute a
function at a time. Under the hood, synchronize()
uses a Redlock, so refer
to the Redlock documentation for more details.
Here’s how to use synchronize()
:
>>> from pottery import synchronize
>>> @synchronize(key='synchronized-func', masters={redis}, auto_release_time=1.5, blocking=True, timeout=-1)
... def func():
... # Only one thread can execute this function at a time.
... return True
...
>>> func()
True
>>>
AIORedlock ð
AIORedlock
is the asyncio implementation of Redlock, compatible with
Python’s
asyncio.Lock
.
Instantiate an AIORedlock
and protect a resource:
>>> import asyncio
>>> from redis.asyncio import Redis as AIORedis
>>> from pottery import AIORedlock
>>> async def main():
... aioredis = AIORedis.from_url('redis://localhost:6379/1')
... shower = AIORedlock(key='shower', masters={aioredis})
... if await shower.acquire():
... # Critical section - no other coroutine can enter while we hold the lock.
... print(f"shower is {'occupied' if await shower.locked() else 'available'}")
... await shower.release()
... print(f"shower is {'occupied' if await shower.locked() else 'available'}")
...
>>> asyncio.run(main(), debug=True)
shower is occupied
shower is available
>>>
Or you can protect access to your resource inside a context manager:
>>> asyncio.set_event_loop(asyncio.new_event_loop())
>>> async def main():
... aioredis = AIORedis.from_url('redis://localhost:6379/1')
... shower = AIORedlock(key='shower', masters={aioredis})
... async with shower:
... # Critical section - no other coroutine can enter while we hold the lock.
... print(f"shower is {'occupied' if await shower.locked() else 'available'}")
... print(f"shower is {'occupied' if await shower.locked() else 'available'}")
...
>>> asyncio.run(main(), debug=True)
shower is occupied
shower is available
>>>
NextID ð¢
NextID
safely and reliably produces increasing IDs across threads, processes,
and even machines, without a single point of failure. Rationale and algorithm
description.
Instantiate an ID generator:
>>> from pottery import NextID
>>> tweet_ids = NextID(key='tweet-ids', masters={redis})
>>>
The key
argument represents the sequence (so that you can have different
sequences for user IDs, comment IDs, etc.), and the masters
argument
specifies your Redis masters across which to distribute ID generation (in
production, you should have 5 Redis masters). Now, whenever you need a user
ID, call next()
on the ID generator:
>>> next(tweet_ids)
1
>>> next(tweet_ids)
2
>>> next(tweet_ids)
3
>>>
Two caveats:
- If many clients are generating IDs concurrently, then there may be “holes” in the sequence of IDs (e.g.: 1, 2, 6, 10, 11, 21, …).
- This algorithm scales to about 5,000 IDs per second (with 5 Redis masters). If you need IDs faster than that, then you may want to consider other techniques.
redis_cache()
redis_cache()
is a simple lightweight unbounded function return value cache,
sometimes called
“memoize”.
redis_cache()
implements Python’s excellent
functools.cache()
API as closely as is feasible. In other words, you can use redis_cache()
the
same way that you use functools.cache()
.
Limitations:
- Arguments to the function must be hashable.
- Return values from the function must be JSON serializable.
- Just like
functools.cache()
,redis_cache()
does not allow for a maximum size, and does not evict old values, and grows unbounded. Only useredis_cache()
in one of these cases:- Your function’s argument space has a known small cardinality.
- You specify a
timeout
when callingredis_cache()
to decorate your function, to dump your entire return value cachetimeout
seconds after the last cache access (hit or miss). - You periodically call
.cache_clear()
to dump your entire return value cache. - You’re ok with your return value cache growing unbounded, and you understand the implications of this for your underlying Redis instance.
In general, you should only use redis_cache()
when you want to reuse
previously computed values. Accordingly, it doesn’t make sense to cache
functions with side-effects or impure functions such as time()
or random()
.
Decorate a function:
>>> import time
>>> from pottery import redis_cache
>>> @redis_cache(redis=redis, key='expensive-function-cache')
... def expensive_function(n):
... time.sleep(.1) # Simulate an expensive computation or database lookup.
... return n
...
>>>
Notice the two keyword arguments to redis_cache()
: The first is your Redis
client. The second is the Redis key name for your function’s return
value cache.
Call your function and observe the cache hit/miss rates:
>>> expensive_function(5)
5
>>> expensive_function.cache_info()
CacheInfo(hits=0, misses=1, maxsize=None, currsize=1)
>>> expensive_function(5)
5
>>> expensive_function.cache_info()
CacheInfo(hits=1, misses=1, maxsize=None, currsize=1)
>>> expensive_function(6)
6
>>> expensive_function.cache_info()
CacheInfo(hits=1, misses=2, maxsize=None, currsize=2)
>>>
Notice that the first call to expensive_function()
takes 1 second and results
in a cache miss; but the second call returns almost immediately and results in
a cache hit. This is because after the first call, redis_cache()
cached the
return value for the call when n == 5
.
You can access your original undecorated underlying expensive_function()
as
expensive_function.__wrapped__
. This is useful for introspection, for
bypassing the cache, or for rewrapping the original function with a different
cache.
You can force a cache reset for a particular combination of args
/kwargs
with expensive_function.__bypass__
. A call to
expensive_function.__bypass__(*args, **kwargs)
bypasses the cache lookup,
calls the original underlying function, then caches the results for future
calls to expensive_function(*args, **kwargs)
. Note that a call to
expensive_function.__bypass__(*args, **kwargs)
results in neither a cache hit
nor a cache miss.
Finally, clear/invalidate your function’s entire return value cache with
expensive_function.cache_clear()
:
>>> expensive_function.cache_info()
CacheInfo(hits=1, misses=2, maxsize=None, currsize=2)
>>> expensive_function.cache_clear()
>>> expensive_function.cache_info()
CacheInfo(hits=0, misses=0, maxsize=None, currsize=0)
>>>
CachedOrderedDict
The best way that I can explain CachedOrderedDict
is through an example
use-case. Imagine that your search engine returns document IDs, which then you
have to hydrate into full documents via the database to return to the client.
The data structure used to represent such search results must have the
following properties:
- It must preserve the order of the document IDs returned by the search engine.
- It must map document IDs to hydrated documents.
- It must cache previously hydrated documents.
Properties 1 and 2 are satisfied by Python’s
collections.OrderedDict
.
However, CachedOrderedDict
extends Python’s OrderedDict
to also
satisfy property 3.
The most common usage pattern for CachedOrderedDict
is as follows:
- Instantiate
CachedOrderedDict
with the IDs that you must look up or compute passed in as thedict_keys
argument to the initializer. - Compute and store the cache misses for future lookups.
- Return some representation of your
CachedOrderedDict
to the client.
Instantiate a CachedOrderedDict
:
>>> from pottery import CachedOrderedDict
>>> search_results_1 = CachedOrderedDict(
... redis_client=redis,
... redis_key='search-results',
... dict_keys=(1, 2, 3, 4, 5),
... )
>>>
The redis_client
argument to the initializer is your Redis client, and the
redis_key
argument is the Redis key for the Redis Hash backing your cache.
The dict_keys
argument represents an ordered iterable of keys to be looked up
and automatically populated in your CachedOrderedDict
(on cache hits), or
that you’ll have to compute and populate for future lookups (on cache
misses). Regardless of whether keys are cache hits or misses,
CachedOrderedDict
preserves the order of dict_keys
(like a list), maps
those keys to values (like a dict), and maintains an underlying cache for
future key lookups.
In the beginning, the cache is empty, so let’s populate it:
>>> sorted(search_results_1.misses())
[1, 2, 3, 4, 5]
>>> search_results_1[1] = 'one'
>>> search_results_1[2] = 'two'
>>> search_results_1[3] = 'three'
>>> search_results_1[4] = 'four'
>>> search_results_1[5] = 'five'
>>> sorted(search_results_1.misses())
[]
>>>
Note that CachedOrderedDict
preserves the order of dict_keys
:
>>> for key, value in search_results_1.items():
... print(f'{key}: {value}')
1: one
2: two
3: three
4: four
5: five
>>>
Now, let’s look at a combination of cache hits and misses:
>>> search_results_2 = CachedOrderedDict(
... redis_client=redis,
... redis_key='search-results',
... dict_keys=(2, 4, 6, 8, 10),
... )
>>> sorted(search_results_2.misses())
[6, 8, 10]
>>> search_results_2[2]
'two'
>>> search_results_2[6] = 'six'
>>> search_results_2[8] = 'eight'
>>> search_results_2[10] = 'ten'
>>> sorted(search_results_2.misses())
[]
>>> for key, value in search_results_2.items():
... print(f'{key}: {value}')
2: two
4: four
6: six
8: eight
10: ten
>>>
Limitations:
- Keys and values must be JSON serializable.
Bloom filters ð¸
Bloom filters are a powerful data structure that help you to answer the questions, “Have I seen this element before?” and “How many distinct elements have I seen?”; but not the question, “What are all of the elements that I’ve seen before?” So think of Bloom filters as Python sets that you can add elements to, use to test element membership, and get the length of; but that you can’t iterate through or get elements back out of.
Bloom filters are probabilistic, which means that they can sometimes generate false positives (as in, they may report that you’ve seen a particular element before even though you haven’t). But they will never generate false negatives (so every time that they report that you haven’t seen a particular element before, you really must never have seen it). You can tune your acceptable false positive probability, though at the expense of the storage size and the element insertion/lookup time of your Bloom filter.
Create a BloomFilter
:
>>> from pottery import BloomFilter
>>> dilberts = BloomFilter(
... num_elements=100,
... false_positives=0.01,
... redis=redis,
... key='dilberts',
... )
>>>
Here, num_elements
represents the number of elements that you expect to
insert into your BloomFilter
, and false_positives
represents your
acceptable false positive probability. Using these two parameters,
BloomFilter
automatically computes its own storage size and number of times
to run its hash functions on element insertion/lookup such that it can
guarantee a false positive rate at or below what you can tolerate, given that
you’re going to insert your specified number of elements.
Insert an element into the BloomFilter
:
>>> dilberts.add('rajiv')
>>>
Test for membership in the BloomFilter
:
>>> 'rajiv' in dilberts
True
>>> 'raj' in dilberts
False
>>> 'dan' in dilberts
False
>>>
See how many elements we’ve inserted into the BloomFilter
:
>>> len(dilberts)
1
>>>
Note that BloomFilter.__len__()
is an approximation, not an exact value,
though it’s quite accurate.
Insert multiple elements into the BloomFilter
:
>>> dilberts.update({'raj', 'dan'})
>>>
Do more efficient membership testing for multiple elements using
.contains_many()
:
>>> tuple(dilberts.contains_many('rajiv', 'raj', 'dan', 'luis'))
(True, True, True, False)
>>>
Remove all of the elements from the BloomFilter
:
>>> dilberts.clear()
>>> len(dilberts)
0
>>>
Limitations:
- Elements must be JSON serializable.
len(bf)
is probabilistic in that it’s an accurate approximation. You can tune how accurate you want it to be with thenum_elements
andfalse_positives
arguments to.__init__()
, at the expense of storage space and insertion/lookup time.- Membership testing against a Bloom filter is probabilistic in that it may
return false positives, but never returns false negatives. This means that
if
element in bf
evaluates toTrue
, then you may have inserted the element into the Bloom filter. But ifelement in bf
evaluates toFalse
, then you must not have inserted it. Again, you can tune accuracy with thenum_elements
andfalse_positives
arguments to.__init__()
, at the expense of storage space and insertion/lookup time.
HyperLogLogs ðªµ
HyperLogLogs are an interesting data structure designed to answer the question, “How many distinct elements have I seen?”; but not the questions, “Have I seen this element before?” or “What are all of the elements that I’ve seen before?” So think of HyperLogLogs as Python sets that you can add elements to and get the length of; but that you can’t use to test element membership, iterate through, or get elements out of.
HyperLogLogs are probabilistic, which means that they’re accurate within a margin of error up to 2%. However, they can reasonably accurately estimate the cardinality (size) of vast datasets (like the number of unique Google searches issued in a day) with a tiny amount of storage (1.5 KB).
Create a HyperLogLog
:
>>> from pottery import HyperLogLog
>>> google_searches = HyperLogLog(redis=redis, key='google-searches')
>>>
Insert an element into the HyperLogLog
:
>>> google_searches.add('sonic the hedgehog video game')
>>>
See how many elements we’ve inserted into the HyperLogLog
:
>>> len(google_searches)
1
>>>
Insert multiple elements into the HyperLogLog
:
>>> google_searches.update({
... 'google in 1998',
... 'minesweeper',
... 'joey tribbiani',
... 'wizard of oz',
... 'rgb to hex',
... 'pac-man',
... 'breathing exercise',
... 'do a barrel roll',
... 'snake',
... })
>>> len(google_searches)
10
>>>
Through a clever hack, we can do membership testing against a HyperLogLog
,
even though it was never designed for this purpose. The way that the hack works
is that it creates a temporary copy of the HyperLogLog
, then inserts the
element that you’re running the membership test for into the temporary
copy. If the insertion changes the temporary HyperLogLog
’s cardinality,
then the element must not have been inserted into the original HyperLogLog
.
>>> 'joey tribbiani' in google_searches
True
>>> 'jennifer aniston' in google_searches
False
>>>
Do more efficient membership testing for multiple elements using
.contains_many()
:
>>> tuple(google_searches.contains_many('joey tribbiani', 'jennifer aniston'))
(True, False)
>>>
Remove all of the elements from the HyperLogLog
:
>>> google_searches.clear()
>>> len(google_searches)
0
>>>
Limitations:
- Elements must be JSON serializable.
len(hll)
is probabilistic in that it’s an accurate approximation.- Membership testing against a HyperLogLog is probabilistic in that it may
return false positives, but never returns false negatives. This means that
if
element in hll
evaluates toTrue
, then you may have inserted the element into the HyperLogLog. But ifelement in hll
evaluates toFalse
, then you must not have inserted it.
ContextTimer â±ï¸
ContextTimer
helps you easily and accurately measure elapsed time. Note that
ContextTimer
measures wall (real-world) time, not CPU time; and that
elapsed()
returns time in milliseconds.
You can use ContextTimer
stand-alone…
>>> import time
>>> from pottery import ContextTimer
>>> timer = ContextTimer()
>>> timer.start()
>>> time.sleep(0.1)
>>> 100 <= timer.elapsed() < 200
True
>>> timer.stop()
>>> time.sleep(0.1)
>>> 100 <= timer.elapsed() < 200
True
>>>
…or as a context manager:
>>> tests = []
>>> with ContextTimer() as timer:
... time.sleep(0.1)
... tests.append(100 <= timer.elapsed() < 200)
>>> time.sleep(0.1)
>>> tests.append(100 <= timer.elapsed() < 200)
>>> tests
[True, True]
>>>
Contributing
Obtain source code
- Clone the git repo:
$ git clone git@github.com:brainix/pottery.git
$ cd pottery/
- Install project-level dependencies:
$ make install
Run tests
- In one Terminal session:
$ cd pottery/
$ redis-server
- In a second Terminal session:
$ cd pottery/
$ make test
$ make test-readme
make test
runs all of the unit tests as well as the coverage test. However,
sometimes, when debugging, it can be useful to run an individual test module,
class, or method:
- In one Terminal session:
$ cd pottery/
$ redis-server
- In a second Terminal session:
- Run a test module with
$ make test tests=tests.test_dict
- Run a test class with:
$ make test tests=tests.test_dict.DictTests
- Run a test method with:
$ make test tests=tests.test_dict.DictTests.test_keyexistserror
- Run a test module with
make test-readme
doctests the Python code examples in this README to ensure
that they’re correct.
Top Related Projects
Redis is an in-memory database that persists on disk. The data model is key-value, but many different kind of values are supported: Strings, Lists, Sets, Sorted Sets, Hashes, Streams, HyperLogLogs, Bitmaps.
memcached development tree
Disque is a distributed message broker
A generic dynamo implementation for different k-v storage engines
A fast, light-weight proxy for memcached and redis
Distributed reliable key-value store for the most critical data of a distributed system
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot