A Study of API Rate Limiting

HTTP rate limiters can act as the first line of defense in keeping your services in line.

Here are a few more real-life use cases:

Limiting POST on /login and /signup helps to stop spammers from creating too many accounts.
Limiting POST on /upload endpoint allows you to control how much memory your application is spending on buffering+streaming blobs.
Limiting GET on some of your public API endpoints protects you from your customers accessing too much data in a super short period of time.
Public API endpoints need DDoS protection anyway. This middleware won’t help you if your link is saturated through other means of attack.
As mentioned above, limiting your private API endpoints protects you from your own services DoS attack. This accidental DoS can even come from your own Javascript.

Common Practices

Request Queues

When using queues, we allow a certain number of requests to be handled by the system, and the rest is put on a queue to be handled later.

Throttling

Throttling lets API developers control how their API is used by setting up a temporary state, allowing the API to assess each request. When the throttle is triggered, a user may either be disconnected or simply have their bandwidth reduced.

Algorithms

Ticket Bucket

A ticket bucket consists of giving each user a set amount of tokens(or allowed requests) Alongside that, we have a reset timer that will reset the number of available tickets for each user.

For each request from the user, we decrease the number of tokens. When this number becomes 0, we stop serving that particular user until the timer resets the tokens.

Leaky Bucket

The leaky bucket algorithm is a simple, easy-to-implement rate-limiting solution. It translates requests into a First In First Out (FIFO) format, processing the items on the queue at a regular rate.

Leaky Bucket smooths outbursts of traffic, easy to implement on a single server or load-balancer. It’s also small and memory-efficient due to the limited queue size.

Fixed Window

Fixed window algorithms use a fixed rate to track the rate of requests using a simple incremental counter. The window is defined for a set number of seconds. If the counter exceeds the limit for the set duration, the additional requests will be discarded.

The fixed window algorithm is a simple way to ensure your API doesn’t get bogged down with old requests. Your API can still be overloaded using this method, however. If a slew of requests is made when the window refreshes, your API could still be stampeded.

Sliding Log

A sliding log algorithm involves tracking each request via a time-stamped log. Logs with timestamps that exceed the rate limit are discarded. When a new request comes in, the sum of the logs is calculated to determine the request rate. If the request exceeds the limit threshold, they are simply queued.

Sliding log algorithms don’t suffer from the stampeding issues of fixed windows. It can get quite expensive to store an unlimited amount of logs for each request. Calculating the number of requests across multiple servers can also be costly.

Sliding Window

Sliding window algorithms combine the best of Fixed Window and Sliding Log algorithms. A cumulative counter for a set period is used, similar to the fixed window algorithm. The previous window is also assessed to help smooth outbursts of traffic.

The small number of data points needed to assess each request makes the sliding window algorithm an ideal choice for processing large amounts of requests while still being light and fast to run.

Shut up and show me some code!

Writing an API rate limiter using Go is facilitated by the fact that the package `x/time/rate` implements the Token Bucket algorithm. For my case, I'll create an HTTP middleware that I can plug in whenever I need it.

The Go package defines a rate.NewLimiter() that controls how frequently events are allowed to happen. It implements a "token bucket" of size b, initially full and refilled at rate r tokens per second.

Bellow is a simplistic implementation built using a Go map that stores a limiter for each IP/route combination(we can have it limited by just the IP if we want to, or go even fancier with user sessions).

package middleware

import (
	"fmt"
	"log"
	"net"
	"net/http"
	"strings"
	"sync"
	"time"

	"github.com/gin-gonic/gin"
	"golang.org/x/time/rate"
)

var (
	visitorRate   int = 25
	burst         int = 5
	resetInterval int = 5
)

type visitor struct {
	limiter   *rate.Limiter
	lastVisit time.Time
}

// Visitors is a map that stores a limiter for each user
var visitors = make(map[string]*visitor)
var mu sync.Mutex
var ticker *time.Ticker

func init() {
	ticker = time.NewTicker(time.Duration(resetInterval) * time.Minute)
	go cleanVisitorMap() // usually we will pass a context to the go routine in order to stop it if we want
}

// cleanVisitorMap makes sure we periodically clean the map to avoid memory issues
// This current implementation might allow clients to go over the allowed limit, but it's impact is at most 2x so we will leave as so
func cleanVisitorMap() {
	for {
		select {
		case _ = <-ticker.C:
			prevLen := len(visitors)
			mu.Lock()
			visitors = make(map[string]*visitor, prevLen)
			mu.Unlock()
		}

	}
}

func getVisitor(ID string) *visitor {
	mu.Lock()
	defer mu.Unlock()

	currVisitor, exists := visitors[ID]
	rt := rate.Every(time.Second / time.Duration(visitorRate))
	if !exists {
		limiter := rate.NewLimiter(rt, burst)
		visitors[ID] = &visitor{limiter, time.Now()}
		return visitors[ID]
	}

	return currVisitor
}

// Limit is the middleware responsible for keeping track and restricting of how many calls clients are allowed
func Limit() gin.HandlerFunc {
	return func(c *gin.Context) {
		ip, _, err := net.SplitHostPort(c.Request.RemoteAddr)
		if err != nil {
			c.JSON(http.StatusInternalServerError, gin.H{"Error": http.StatusText(http.StatusInternalServerError)})
			c.Abort()
			return
		}

		ID := fmt.Sprintf("%v%v", ip, c.FullPath())

		// Get visitor and test if the limit has been reached
		v := getVisitor(ID)
		if v.limiter.Allow() == false {
			t := time.Now()
			c.JSON(http.StatusTooManyRequests, gin.H{"Error": http.StatusText(http.StatusTooManyRequests)})
			c.Abort()
			return
		}
	}
}