In the world of high-performance systems, APIs, and network security, rate limiting is a powerful technique to control traffic, prevent abuse, and ensure fair resource distribution. One of the most commonly used algorithms for rate limiting is the Token Bucket Algorithm.
This article will cover:
- What is the Token Bucket Algorithm?
- How it works
- Real-world use cases
- Python implementation
- Go implementation
- Comparison with Leaky Bucket
- Best practices
What is the Token Bucket Algorithm?
The Token Bucket Algorithm is a rate-limiting algorithm used to control the amount of data or requests a system can handle over time.
Imagine a bucket that fills with tokens at a fixed rate (e.g., 1 token every second). Every time a user wants to perform an action (like an API call), they must “pay” with a token. If there are no tokens in the bucket, the request is denied or delayed.
Key Characteristics
- Burst-friendly: Allows short bursts of activity as long as there are tokens.
- Flexible: You can adjust the bucket size (capacity) and refill rate.
- Real-time: Works well for live rate limiting.
How Does It Work?
Initialization:
- Set a bucket capacity (maximum number of tokens it can hold).
- Set a token refill rate (tokens added per second).
Token Refill:
- At each time interval, tokens are added to the bucket, up to its capacity.
Request Handling:
- Each incoming request consumes one or more tokens.
- If enough tokens are available, the request is processed.
- If not, the request is rejected or delayed.
Real-World Use Cases
- API rate limiting – Prevent abuse of public APIs
- Network traffic shaping – Control bandwidth usage
- Login throttling – Slow down brute force attacks
- Task scheduling – Ensure fair access to shared resources
Python Implementation of Token Bucket
Let’s implement a basic Token Bucket in Python using threading and time.
import time
import threading
class TokenBucket:
def __init__(self, capacity, refill_rate):
self.capacity = capacity
self.tokens = capacity
self.refill_rate = refill_rate # tokens per second
self.lock = threading.Lock()
self.last_refill = time.time()
def _refill(self):
now = time.time()
elapsed = now - self.last_refill
added_tokens = elapsed * self.refill_rate
self.tokens = min(self.capacity, self.tokens + added_tokens)
self.last_refill = now
def allow_request(self, tokens_needed=1):
with self.lock:
self._refill()
if self.tokens >= tokens_needed:
self.tokens -= tokens_needed
return True
return False
# Example usage
bucket = TokenBucket(capacity=10, refill_rate=2) # 2 tokens/sec
for i in range(20):
if bucket.allow_request():
print(f"[{time.strftime('%X')}] Request {i} allowed")
else:
print(f"[{time.strftime('%X')}] Request {i} denied")
time.sleep(0.3)
Try changing the value of the “time.sleep(0.3)” and you will see the algorithm just passes certain simulated requests.
Golang Implementation of Token Bucket
Here’s a thread-safe token bucket in Go using time.Ticker
and sync.Mutex
.
package main
import (
"fmt"
"sync"
"time"
)
type TokenBucket struct {
capacity float64
tokens float64
refillRate float64
lastRefilled time.Time
mu sync.Mutex
}
func NewTokenBucket(capacity, refillRate float64) *TokenBucket {
return &TokenBucket{
capacity: capacity,
tokens: capacity,
refillRate: refillRate,
lastRefilled: time.Now(),
}
}
func (tb *TokenBucket) AllowRequest(tokensNeeded float64) bool {
tb.mu.Lock()
defer tb.mu.Unlock()
now := time.Now()
elpased := now.Sub(tb.lastRefilled).Seconds()
tb.tokens += elpased * tb.refillRate
if tb.tokens > tb.capacity {
tb.tokens = tb.capacity
}
if tb.tokens >= tokensNeeded {
tb.tokens -= tokensNeeded
tb.lastRefilled = now
return true
}
tb.lastRefilled = now
return false
}
func main() {
bucket := NewTokenBucket(10, 2)
for i := 0; i < 20; i++ {
if bucket.AllowRequest(1) {
fmt.Printf("[%s] Request %d allowed\n", time.Now().Format("15:04:05"), i)
} else {
fmt.Printf("[%s] Request %d denied\n", time.Now().Format("15:04:05"), i)
}
time.Sleep(100 * time.Millisecond)
}
}
Token Bucket vs. Leaky Bucket
Feature | Token Bucket | Leaky Bucket |
---|---|---|
Burst support | ✅ Yes | ❌ No |
Request shaping | ❌ No | ✅ Yes (constant rate) |
Simplicity | ✅ Simple | ✅ Simple |
Real-time control | ✅ Yes | ✅ Yes |
Best Practices
- Thread safety: Use locks or atomic operations.
- Precise timing: Use monotonic clocks to avoid system time jumps.
- Tuning parameters: Experiment with bucket size and refill rate.
- Distributed use: Store token buckets in Redis or distributed caches for multi-node systems.
Conclusion
The Token Bucket Algorithm is a powerful yet simple way to limit the rate of actions across systems. Whether you’re building an API, controlling bandwidth, or preventing abuse, implementing this algorithm in Python or Go can greatly enhance the stability and fairness of your applications.
Want to go further? Add Redis to make it distributed, or use middleware in web frameworks like FastAPI or Gin to plug token buckets into API endpoints.
Leave a Reply