Token Bucket Algorithm: Rate Limiting Explained with Python & Go

In the world of high-performance systems, APIs, and network security, rate limiting is a powerful technique to control traffic, prevent abuse, and ensure fair resource distribution. One of the most commonly used algorithms for rate limiting is the Token Bucket Algorithm.

This article will cover:

What is the Token Bucket Algorithm?
How it works
Real-world use cases
Python implementation
Go implementation
Comparison with Leaky Bucket
Best practices

What is the Token Bucket Algorithm?

The Token Bucket Algorithm is a rate-limiting algorithm used to control the amount of data or requests a system can handle over time.

Imagine a bucket that fills with tokens at a fixed rate (e.g., 1 token every second). Every time a user wants to perform an action (like an API call), they must “pay” with a token. If there are no tokens in the bucket, the request is denied or delayed.

Key Characteristics

Burst-friendly: Allows short bursts of activity as long as there are tokens.
Flexible: You can adjust the bucket size (capacity) and refill rate.
Real-time: Works well for live rate limiting.

How Does It Work?

Initialization:

Set a bucket capacity (maximum number of tokens it can hold).
Set a token refill rate (tokens added per second).

Token Refill:

At each time interval, tokens are added to the bucket, up to its capacity.

Request Handling:

Each incoming request consumes one or more tokens.
If enough tokens are available, the request is processed.
If not, the request is rejected or delayed.

Real-World Use Cases

API rate limiting – Prevent abuse of public APIs

Network traffic shaping – Control bandwidth usage

Login throttling – Slow down brute force attacks

Task scheduling – Ensure fair access to shared resources

Python Implementation of Token Bucket

Let’s implement a basic Token Bucket in Python using threading and time.

import time
import threading

class TokenBucket:
    def __init__(self, capacity, refill_rate):
        self.capacity = capacity
        self.tokens = capacity
        self.refill_rate = refill_rate  # tokens per second
        self.lock = threading.Lock()
        self.last_refill = time.time()

    def _refill(self):
        now = time.time()
        elapsed = now - self.last_refill
        added_tokens = elapsed * self.refill_rate
        self.tokens = min(self.capacity, self.tokens + added_tokens)
        self.last_refill = now

    def allow_request(self, tokens_needed=1):
        with self.lock:
            self._refill()
            if self.tokens >= tokens_needed:
                self.tokens -= tokens_needed
                return True
            return False

# Example usage
bucket = TokenBucket(capacity=10, refill_rate=2)  # 2 tokens/sec

for i in range(20):
    if bucket.allow_request():
        print(f"[{time.strftime('%X')}] Request {i} allowed")
    else:
        print(f"[{time.strftime('%X')}] Request {i} denied")
    time.sleep(0.3)

Try changing the value of the “time.sleep(0.3)” and you will see the algorithm just passes certain simulated requests.

Golang Implementation of Token Bucket

Here’s a thread-safe token bucket in Go using time.Ticker and sync.Mutex.

package main

import (
	"fmt"
	"sync"
	"time"
)

type TokenBucket struct {
	capacity     float64
	tokens       float64
	refillRate   float64
	lastRefilled time.Time
	mu           sync.Mutex
}

func NewTokenBucket(capacity, refillRate float64) *TokenBucket {
	return &TokenBucket{
		capacity:     capacity,
		tokens:       capacity,
		refillRate:   refillRate,
		lastRefilled: time.Now(),
	}
}

func (tb *TokenBucket) AllowRequest(tokensNeeded float64) bool {
	tb.mu.Lock()
	defer tb.mu.Unlock()

	now := time.Now()
	elpased := now.Sub(tb.lastRefilled).Seconds()
	tb.tokens += elpased * tb.refillRate
	if tb.tokens > tb.capacity {
		tb.tokens = tb.capacity
	}
	if tb.tokens >= tokensNeeded {
		tb.tokens -= tokensNeeded
		tb.lastRefilled = now
		return true
	}
	tb.lastRefilled = now
	return false
}

func main() {
	bucket := NewTokenBucket(10, 2)

	for i := 0; i < 20; i++ {
		if bucket.AllowRequest(1) {
			fmt.Printf("[%s] Request %d allowed\n", time.Now().Format("15:04:05"), i)
		} else {
			fmt.Printf("[%s] Request %d denied\n", time.Now().Format("15:04:05"), i)
		}
		time.Sleep(100 * time.Millisecond)
	}
}

Token Bucket vs. Leaky Bucket

Feature	Token Bucket	Leaky Bucket
Burst support	✅ Yes	❌ No
Request shaping	❌ No	✅ Yes (constant rate)
Simplicity	✅ Simple	✅ Simple
Real-time control	✅ Yes	✅ Yes

Best Practices

Thread safety: Use locks or atomic operations.

Precise timing: Use monotonic clocks to avoid system time jumps.

Tuning parameters: Experiment with bucket size and refill rate.

Distributed use: Store token buckets in Redis or distributed caches for multi-node systems.

Conclusion

The Token Bucket Algorithm is a powerful yet simple way to limit the rate of actions across systems. Whether you’re building an API, controlling bandwidth, or preventing abuse, implementing this algorithm in Python or Go can greatly enhance the stability and fairness of your applications.

Want to go further? Add Redis to make it distributed, or use middleware in web frameworks like FastAPI or Gin to plug token buckets into API endpoints.

Token Bucket Algorithm: Rate Limiting Explained with Python & Go

What is the Token Bucket Algorithm?

Key Characteristics

How Does It Work?

Real-World Use Cases

Python Implementation of Token Bucket

Golang Implementation of Token Bucket

Token Bucket vs. Leaky Bucket

Best Practices

Conclusion

Comments

Leave a Reply Cancel reply

More posts

Python Data Model: A Dive into Python’s Object-Oriented Magic

Gin Index in PostgreSQL: A guide

SQLAlchemy Database Locks Using FastAPI: A Simple Guide

Async SQLAlchemy Engine in FastAPI – The Guide