A single unprotected API endpoint can absorb thousands of requests per minute from one IP address. Without rate limiting, that scenario degrades your service for legitimate users, exhausts database connection pools, and opens the door to credential stuffing and brute-force attacks. This tutorial walks you through implementing rate limiting in Node.js with Express, covering everything from a basic in-memory setup to Redis-backed distributed limiting across multiple servers, in 12 steps and about 30 minutes.

The express-rate-limit package (version 8.3.1, the latest as of June 2026 with no known vulnerabilities according to the Snyk vulnerability database) is the standard starting point for Express-based rate limiting. For production deployments with multiple Node.js processes or servers, you will extend it with an external Redis store. By the end of this tutorial you will have a working Express application with global limits, per-route limits, authentication endpoint protection, tiered plan limits, and full test coverage.

What Rate Limiting Prevents

Rate limiting is one layer in a defense-in-depth strategy. The OWASP API Security Top 10 (A04:2023, Unrestricted Resource Consumption) specifically calls out the absence of rate limiting as a critical API vulnerability. The Node.js Best Practices repository, one of the most widely referenced security references in the Node.js ecosystem, recommends rate limiting as a core defense against denial-of-service and brute-force attacks. On June 17, 2026, the Node.js project released security updates for the 26.x, 24.x, and 22.x release lines, a reminder that the runtime security surface is actively maintained. Rate limiting is a developer-controlled defense that works regardless of the runtime version you are on.

Here is what rate limiting directly prevents in a Node.js application:

  • Brute-force attacks on login endpoints: an attacker trying thousands of password combinations per second hits the rate limit after a small number of attempts and receives a 429 response for the rest of the window.
  • Credential stuffing: automated bots replaying leaked username/password pairs slow to a crawl under per-IP limits, making large-scale stuffing campaigns economically unviable against a protected endpoint.
  • API abuse and scraping: scrapers and aggressive clients hammering your data endpoints stop when they receive 429 Too Many Requests responses, reducing server load without blocking legitimate users.
  • Accidental denial of service: a poorly coded client in a tight retry loop exhausts your server without rate limiting in place. This scenario is more common than intentional attacks in many production systems.
  • Resource exhaustion: database connection pools, memory, and CPU are finite. Uncontrolled request volume can deplete these resources and cause cascading failures even when no single request is expensive on its own.
  • Password reset and OTP abuse: endpoints that trigger email or SMS delivery need strict rate limits, or an attacker can use them to spam arbitrary recipients at your expense.

Rate limiting is not a complete solution on its own. It works alongside authentication, input validation, CSRF protection, and secure session management to form a layered defense. The sections below build each component incrementally, starting with the simplest working configuration and adding complexity only where the use case requires it.

Rate Limiting Algorithms Explained

Before writing code, you need to understand what algorithm you are applying. The four main approaches each have distinct trade-offs that affect which one you should use for a given situation. Choosing the wrong algorithm can leave you overly permissive (allowing boundary bursts an attacker can exploit) or overly restrictive (penalizing legitimate users who make normal bursting patterns).

AlgorithmBurst ToleranceFairnessMemory CostBest Use Case
Fixed WindowAllows 2x burst at window boundaryLowLowSimple global API protection
Sliding WindowControlled bursts onlyHighMediumLogin endpoints, fair per-user limits
Token BucketHigh (burst up to bucket size)MediumLowAPIs with burst-friendly SLAs
Leaky BucketNone (constant output rate)HighLowDownstream service protection

Fixed Window Counter

The fixed window algorithm divides time into discrete buckets (for example, 15-minute windows) and counts requests within each bucket. When the count hits the limit, it rejects further requests until the next window opens. The well-known weakness is the boundary burst problem: a client can send the full quota in the final second of one window and again in the first second of the next, sending roughly twice the configured limit in a very short burst. Despite this limitation, the fixed window is the default behavior for express-rate-limit and is well-suited for coarse-grained protection where simplicity matters more than burst precision. For most public APIs and general traffic limiting, the fixed window is the right starting point.

Sliding Window Log

The sliding window approach tracks the exact timestamp of each request and counts how many occurred within the last N minutes at any given moment. This eliminates the boundary burst problem because the window moves continuously rather than resetting at fixed intervals. The cost is additional memory: each request entry must be stored until it ages out of the window, and atomic operations against a Redis sorted set are required for distributed correctness. For login endpoints and sensitive operations where burst fairness matters, the extra precision and operational overhead are worth the cost. When combined with a Redis store that implements atomic sliding window counters, express-rate-limit supports this approach transparently through the store abstraction layer.

Token Bucket and Leaky Bucket

The token bucket model gives each client a bucket that refills at a fixed rate. Clients consume one token per request and receive 429 errors when the bucket is empty. This design tolerates short legitimate bursts (up to the bucket capacity) while enforcing a long-term average throughput rate. It is a good fit for APIs where you want to accommodate clients that batch requests occasionally without penalizing them for normal usage patterns that include brief spikes.

The leaky bucket inverts this logic: incoming requests enter a queue and drain at a constant output rate regardless of input rate. Leaky bucket is preferred when you want smooth throughput to a downstream service and cannot afford any burst reaching it. The rate-limiter-flexible npm package implements both token bucket and leaky bucket models and supports Redis, MongoDB, MySQL, and PostgreSQL as backends, making it the right choice when express-rate-limit‘s fixed-window model is not expressive enough for your use case.

Prerequisites

Before starting, make sure you have the following in place:

  • Node.js 22.x or 24.x (both are active LTS versions as of June 2026 with security support from the Node.js project; the June 17, 2026 security release covers both lines)
  • npm 10.x or later (ships bundled with Node.js 22+)
  • Express 4.x or 5.x
  • Redis 7.x (required for Steps 8 onward; the first seven steps use only in-memory storage and do not need Redis)
  • Basic familiarity with Express middleware: how app.use() works and the (req, res, next) function signature
  • A terminal and a text editor

All commands in this tutorial use npm. If you prefer yarn or pnpm, substitute the equivalent commands. The final project is a self-contained Express application you can run with node src/server.js or extend as the foundation for a real production service. Estimated time: 25 to 35 minutes for a developer familiar with Node.js and Express.

Step 1: Initialize the Project and Install Dependencies

Create a project directory and initialize it as a Node.js package:

mkdir rate-limit-demo && cd rate-limit-demo
npm init -y
mkdir src src/__tests__

Install the core runtime and development dependencies:

npm install express express-rate-limit express-slow-down
npm install --save-dev jest supertest

express-rate-limit is at version 8.3.1 as of June 2026 with no known vulnerabilities in that version. The package provides basic IP rate-limiting middleware for Express and supports a pluggable store interface for external backends. express-slow-down is the official companion package that implements progressive throttling (adding delays rather than rejecting requests outright). supertest handles integration testing in Step 11 by letting you send HTTP requests directly to the Express app without binding to a real port.

Update the scripts section of package.json now so you can run tests throughout development:

{
  "scripts": {
    "start": "node src/server.js",
    "test": "jest --testTimeout=30000"
  }
}

Step 2: Build the Base Express Server

Separate the Express application factory from the server entry point. This pattern is essential for testability: test files can import the app module without binding to a port, and server.js handles the actual listening. Without this separation, running multiple test files that import the same server would cause port conflicts.

Create src/app.js:

// src/app.js
const express = require('express');

const app = express();
app.use(express.json());

// Health check route: registered BEFORE the global rate limiter
app.get('/health', (req, res) => res.json({ status: 'ok' }));

// Public read endpoints
app.get('/api/products', (req, res) => {
  res.json({ products: ['Widget A', 'Widget B', 'Widget C'] });
});

app.get('/api/search', (req, res) => {
  const { q } = req.query;
  res.json({ results: [], query: q });
});

// Write endpoint
app.post('/api/products', (req, res) => {
  res.status(201).json({ message: 'Product created', data: req.body });
});

// Auth endpoint
app.post('/api/auth/login', (req, res) => {
  const { username, password } = req.body;
  if (username === 'admin' && password === 'secret') {
    return res.json({ token: 'example-jwt-token' });
  }
  res.status(401).json({ error: 'Invalid credentials' });
});

module.exports = app;

Create src/server.js:

// src/server.js
const app = require('./app');
const PORT = process.env.PORT || 3000;
app.listen(PORT, () => console.log(`Server running on port ${PORT}`));

Start the server with node src/server.js and confirm the health endpoint responds: curl http://localhost:3000/health should return {"status":"ok"}. Verify /api/products returns a 200 before moving on to the rate limiting steps.

Step 3: Apply a Global Rate Limiter

The global limiter is your first line of defense. It catches any client sending excessive request volume across any route, before any route handler or business logic runs. Register it in src/app.js after the health check route (so the health check is excluded) but before all other route definitions:

const { rateLimit } = require('express-rate-limit');

// Global limiter: 100 requests per 15 minutes per IP
const globalLimiter = rateLimit({
  windowMs: 15 * 60 * 1000,    // 15-minute window
  limit: 100,                   // max requests per window per IP
  standardHeaders: 'draft-7',   // emit IETF standard RateLimit headers
  legacyHeaders: false,         // disable X-RateLimit-* legacy headers
  message: {
    status: 429,
    error: 'Too Many Requests',
    message: 'You have exceeded the rate limit. Please wait 15 minutes before retrying.',
  },
});

// Apply AFTER health check, BEFORE all other routes
app.use(globalLimiter);

The standardHeaders: 'draft-7' option emits the IETF draft-standard RateLimit header family (RateLimit-Limit, RateLimit-Remaining, RateLimit-Reset, RateLimit-Policy) on every response. This gives API consumers machine-readable quota information, which reduces unnecessary retries and makes debugging simple. Setting legacyHeaders: false prevents sending both the old X-RateLimit-* format and the new format simultaneously, which would create confusing duplicates in API clients that parse headers.

Verify the limiter with a shell loop. Run this against your running server:

for i in $(seq 1 105); do
  echo -n "Request $i: "
  curl -s -o /dev/null -w "%{http_code}" http://localhost:3000/api/products
  echo
done

Expected output: requests 1 through 100 print 200. Requests 101 through 105 print 429. The response body for rate-limited requests will be the JSON error message you configured.

Step 4: Protect Authentication Endpoints

Authentication endpoints need a separate, much stricter limit than your general API routes. A global limit of 100 requests per 15 minutes still allows an attacker to try 100 password combinations before being blocked. For credential stuffing to be computationally infeasible, the auth-specific limit needs to be in the single digits per window.

The critical option for auth limiters is skipSuccessfulRequests: true, which ensures that only failed authentication attempts count toward the limit. A legitimate user who logs in successfully does not consume quota. Only 4xx responses from the login handler count as failed attempts:

// Auth limiter: 5 failed attempts per 15 minutes per IP
const authLimiter = rateLimit({
  windowMs: 15 * 60 * 1000,
  limit: 5,
  skipSuccessfulRequests: true,   // successful logins don't consume quota
  standardHeaders: 'draft-7',
  legacyHeaders: false,
  message: {
    status: 429,
    error: 'Too Many Authentication Attempts',
    message: 'Too many failed login attempts. Please wait 15 minutes before trying again.',
  },
});

// Update the login route to use the auth-specific limiter
app.post('/api/auth/login', authLimiter, (req, res) => {
  const { username, password } = req.body;
  if (username === 'admin' && password === 'secret') {
    return res.json({ token: 'example-jwt-token' });
  }
  res.status(401).json({ error: 'Invalid credentials' });
});

This pattern aligns with the Node.js best practices guidance for brute-force defense: restrict a specific IP after a configurable number of failed attempts, then require a mandatory timeout. Apply the same approach to any endpoint that delivers OTP codes, password reset emails, or SMS messages, since those endpoints can be used to spam arbitrary recipients if they have no rate limit. A good default is 3 to 5 attempts per 15-minute window for anything that delivers a message or checks a credential.

For more granular brute-force defense, you can combine IP-based limiting with username-based limiting by changing the keyGenerator option to return the submitted username instead of (or in addition to) the IP address. This catches distributed attacks where one attacker sends credential pairs from many different IP addresses, all targeting the same account.

Step 5: Add Route-Level Rate Limits

Different endpoints carry different resource costs and risk profiles. A full-text search or vector similarity query can be 10 to 100 times more expensive per request than a simple product listing that returns cached data. Applying identical limits to both endpoints either leaves expensive ones dangerously under-protected or makes cheap ones unnecessarily restrictive for normal browsing patterns.

Define separate limiters for different route groups based on actual resource cost and sensitivity:

// Strict limiter for expensive search operations
const searchLimiter = rateLimit({
  windowMs: 60 * 1000,    // 1-minute window
  limit: 10,              // 10 search requests per minute per IP
  standardHeaders: 'draft-7',
  legacyHeaders: false,
  message: {
    status: 429,
    error: 'Search Rate Limit Exceeded',
    message: 'Maximum 10 searches per minute. Please reduce your request frequency.',
  },
});

// Moderate limiter for write operations
const writeLimiter = rateLimit({
  windowMs: 60 * 1000,
  limit: 30,
  standardHeaders: 'draft-7',
  legacyHeaders: false,
});

// Apply per-route limiters
app.get('/api/search', searchLimiter, (req, res) => {
  const { q } = req.query;
  res.json({ results: [], query: q });
});

app.post('/api/products', writeLimiter, (req, res) => {
  res.status(201).json({ message: 'Product created' });
});

You can also apply a limiter to an entire Express Router, which avoids repeating it on every route definition within that group:

const apiRouter = express.Router();

// Apply write limiter to all routes in this router
apiRouter.use(writeLimiter);

apiRouter.get('/users', (req, res) => res.json({ users: [] }));
apiRouter.get('/orders', (req, res) => res.json({ orders: [] }));
apiRouter.delete('/users/:id', (req, res) => res.json({ deleted: req.params.id }));

app.use('/api', apiRouter);

When stacking multiple limiters on the same route (global plus route-specific), both apply independently. A request must stay within both limits. This lets you define a conservative global baseline and tighten it for sensitive routes without duplicating the baseline configuration on every route definition.

Step 6: Build Custom Error Responses and Rate Limit Headers

A bare 429 status with no body context frustrates API consumers and generates unnecessary support tickets. A well-formed rate limit error tells the client what limit was hit, how many requests remain, and exactly when they can retry. The handler option gives you full control over the error response and lets you add structured logging at the same time:

const globalLimiter = rateLimit({
  windowMs: 15 * 60 * 1000,
  limit: 100,
  standardHeaders: 'draft-7',
  legacyHeaders: false,
  handler: (req, res, next, options) => {
    // Structured log for monitoring and alerting pipelines
    console.error(JSON.stringify({
      event:      'rate_limit_exceeded',
      ip:         req.ip,
      path:       req.path,
      method:     req.method,
      userAgent:  req.headers['user-agent'],
      apiKey:     req.headers['x-api-key'] ? '[present]' : '[absent]',
      windowMs:   options.windowMs,
      limit:      options.limit,
      timestamp:  new Date().toISOString(),
    }));

    const retryAfterSeconds = Math.ceil(options.windowMs / 1000);

    // Set Retry-After explicitly (not emitted automatically by draft-7)
    res.set('Retry-After', String(retryAfterSeconds));

    res.status(options.statusCode).json({
      status: 429,
      error: 'Too Many Requests',
      message: 'You have exceeded the rate limit for this endpoint.',
      retryAfter: retryAfterSeconds,
    });
  },
});

The table below shows the full set of rate limit response headers emitted with standardHeaders: 'draft-7'. These headers appear on every response, not just on 429s, so clients can monitor their remaining quota proactively:

HeaderExample ValueDescription
RateLimit-Policy100;w=900The enforced policy: 100 requests per 900-second window
RateLimit-Limit100Maximum requests allowed in the current window
RateLimit-Remaining73Requests left before the limit is hit
RateLimit-Reset1750129200Unix timestamp when the current window resets
Retry-After347Seconds to wait before retrying (sent on 429 responses only, set manually in handler)

Structured JSON logs from the handler callback integrate directly with log aggregation tools like Datadog, Grafana Loki, and AWS CloudWatch Logs Insights. A spike in rate_limit_exceeded events on your auth endpoint is one of the clearest early warning signals of a credential stuffing campaign. Configure an alert on this metric with a threshold above your normal baseline before deploying to production.

Step 7: Add Progressive Throttling with express-slow-down

Hard rate limits create a sharp cliff: the first 100 requests get full-speed responses, and the 101st receives an immediate rejection. express-slow-down provides a softer alternative by adding incremental delays to responses as a client approaches the limit. This degrades scrapers and bots gradually rather than cutting them off abruptly, and it gives legitimate clients who burst briefly a natural opportunity to slow down before hitting the hard rejection ceiling.

const { slowDown } = require('express-slow-down');

const speedLimiter = slowDown({
  windowMs: 15 * 60 * 1000,      // 15-minute window
  delayAfter: 50,                // start adding delays after 50 requests
  delayMs: (hits) => hits * 100, // add 100ms per additional request above 50
  maxDelayMs: 5000,               // cap delay at 5 seconds per request
});

// Apply slow-down first, then the hard limit
// A request goes through speed limiter, then global limiter, then route handler
app.use('/api', speedLimiter, globalLimiter);

With this configuration, requests 1 through 50 return immediately. Request 51 is delayed 100 milliseconds. Request 60 is delayed 1,000 milliseconds (1 second). The delay grows linearly until it reaches the 5-second cap. When a client exceeds the hard limit at 100 requests, they receive a 429 rejection for the remainder of the 15-minute window.

The combination of progressive throttling plus a hard limit is particularly effective on login endpoints. A legitimate user who misremembers their password waits a few extra seconds before retrying, which is noticeable but not disruptive. A brute-force script that runs as fast as possible will slow to a crawl before reaching the hard cap, then receive 429 responses for the rest of the window while consuming very little of the attacker’s value per request.

Step 8: Implement Redis-Backed Distributed Rate Limiting

The in-memory store works correctly for a single Node.js process. The moment you run multiple processes (via the Node.js cluster module) or multiple server instances behind a load balancer, each process maintains its own independent counter. A client who hits the rate limit on worker 1 can immediately send the same request to worker 2 and bypass the limit. The default in-memory store is explicitly not shared across processes by design, as noted in the official express-rate-limit documentation.

Redis provides a shared, atomic counter that all processes and servers read and write to consistently. Install the official Redis store adapter:

npm install rate-limit-redis ioredis

Create src/redis-limiter.js:

// src/redis-limiter.js
const { rateLimit } = require('express-rate-limit');
const { RedisStore } = require('rate-limit-redis');
const Redis = require('ioredis');

const redisClient = new Redis({
  host: process.env.REDIS_HOST || 'localhost',
  port: parseInt(process.env.REDIS_PORT, 10) || 6379,
  enableReadyCheck: true,
  maxRetriesPerRequest: 3,
  lazyConnect: false,
});

redisClient.on('error', (err) => {
  // Log but don't crash — rate limiting degrades gracefully on Redis failure
  console.error('Redis client error:', err.message);
});

const distributedLimiter = rateLimit({
  windowMs: 15 * 60 * 1000,
  limit: 100,
  standardHeaders: 'draft-7',
  legacyHeaders: false,
  store: new RedisStore({
    sendCommand: (...args) => redisClient.call(...args),
    prefix: 'rl:global:', // namespace keys to avoid collisions
  }),
});

const distributedAuthLimiter = rateLimit({
  windowMs: 15 * 60 * 1000,
  limit: 5,
  skipSuccessfulRequests: true,
  standardHeaders: 'draft-7',
  legacyHeaders: false,
  store: new RedisStore({
    sendCommand: (...args) => redisClient.call(...args),
    prefix: 'rl:auth:',
  }),
});

module.exports = { distributedLimiter, distributedAuthLimiter, redisClient };

The prefix option namespaces all rate-limit keys within Redis. Use distinct prefixes for each limiter (rl:global:, rl:auth:, rl:search:) to avoid key collisions and to make it easy to inspect or flush specific limiter data in the Redis CLI using SCAN with a pattern match. Update src/app.js to import and use the distributed limiters:

const { distributedLimiter, distributedAuthLimiter } = require('./redis-limiter');

app.use(distributedLimiter);

app.post('/api/auth/login', distributedAuthLimiter, (req, res) => {
  const { username, password } = req.body;
  if (username === 'admin' && password === 'secret') {
    return res.json({ token: 'example-jwt-token' });
  }
  res.status(401).json({ error: 'Invalid credentials' });
});

The rate-limit-redis package is the officially maintained adapter and is kept in sync with express-rate-limit‘s store interface versioning. If you run Node.js in cluster mode on a single machine rather than separate instances, the official docs recommend @express-rate-limit/cluster-memory-store as a lighter alternative that shares state across workers through the Node.js cluster IPC channel without requiring a Redis installation.

Step 9: Configure Trust Proxy for Production

This step resolves the most common production rate limiting failure mode. When your Node.js application sits behind a reverse proxy (nginx, AWS ALB, Cloudflare, or any similar layer), Express sees the proxy’s IP address in req.ip, not the real client IP. The result: every request from every client appears to come from the same proxy IP, so the first 100 requests across all users trip the global limit and everyone receives 429 responses simultaneously.

Add this line at the very top of src/app.js, before any middleware registration:

// For one nginx reverse proxy
app.set('trust proxy', 1);

// For two proxy hops (e.g., Cloudflare in front of nginx)
app.set('trust proxy', 2);

// For a specific known internal proxy IP (most restrictive)
app.set('trust proxy', '10.0.0.1');

With trust proxy set, Express reads the real client IP from the X-Forwarded-For header chain and uses it as req.ip. Your nginx configuration must also forward the correct headers:

# nginx.conf
location /api {
    proxy_pass         http://localhost:3000;
    proxy_set_header   X-Real-IP          $remote_addr;
    proxy_set_header   X-Forwarded-For    $proxy_add_x_forwarded_for;
    proxy_set_header   X-Forwarded-Proto  $scheme;
    proxy_set_header   Host               $host;
}

Never set trust proxy to true (which trusts all forwarded headers unconditionally) unless you can guarantee that no traffic reaches your Node.js process without passing through a trusted proxy first. Setting it to true lets any client spoof their IP by sending a crafted X-Forwarded-For header, bypassing per-IP rate limits entirely. Use the numeric hop count or a specific trusted CIDR instead.

Verify the fix: add console.log('Client IP:', req.ip) to any route handler in development and confirm that clients from different networks show different IP values. If you still see the proxy IP after configuration, check that nginx is actually sending X-Forwarded-For headers and that the hop count in trust proxy matches your actual infrastructure depth.

Step 10: Implement Tiered Rate Limits by API Plan

Production APIs rarely apply a single limit to all clients. SaaS products typically offer multiple service tiers with different request quotas. A middleware function that reads the authenticated user’s plan and selects the appropriate rate limiter is a clean pattern that keeps tier logic centralized and avoids duplicating route definitions for each plan level.

PlanRequests / MinuteRate Key StrategyRedis PrefixNotes
Free10IP addressrl:free:No authentication required
Basic100User IDrl:basic:Authentication required
Premium1,000User IDrl:premium:Authentication required
Enterprise10,000User IDrl:enterprise:Custom SLA, negotiated limits

Create src/tiered-limiter.js:

// src/tiered-limiter.js
const { rateLimit } = require('express-rate-limit');
const { RedisStore } = require('rate-limit-redis');
const { redisClient } = require('./redis-limiter');

const PLAN_LIMITS = {
  free:       { limit: 10,    windowMs: 60 * 1000 },
  basic:      { limit: 100,   windowMs: 60 * 1000 },
  premium:    { limit: 1000,  windowMs: 60 * 1000 },
  enterprise: { limit: 10000, windowMs: 60 * 1000 },
};

function createPlanLimiter(plan) {
  const config = PLAN_LIMITS[plan];
  return rateLimit({
    windowMs: config.windowMs,
    limit: config.limit,
    standardHeaders: 'draft-7',
    legacyHeaders: false,
    // Key by user ID for authenticated users, fall back to IP for anonymous
    keyGenerator: (req) => req.user?.id || req.headers['x-api-key'] || req.ip,
    store: new RedisStore({
      sendCommand: (...args) => redisClient.call(...args),
      prefix: `rl:${plan}:`,
    }),
    message: {
      status: 429,
      error: 'Rate Limit Exceeded',
      message: `Your ${plan} plan allows ${config.limit} requests per minute.`,
      plan,
    },
  });
}

// Build one limiter per plan at module load time (not per request)
const planLimiters = Object.fromEntries(
  Object.keys(PLAN_LIMITS).map((plan) => [plan, createPlanLimiter(plan)])
);

function tieredLimiter(req, res, next) {
  const plan = req.user?.plan || 'free';
  const limiter = planLimiters[plan] || planLimiters.free;
  return limiter(req, res, next);
}

module.exports = { tieredLimiter };

Apply the tiered limiter to authenticated routes after JWT verification populates req.user:

const { tieredLimiter } = require('./tiered-limiter');

// authenticateJwt populates req.user = { id, plan, ... }
app.use('/api/protected', authenticateJwt, tieredLimiter, (req, res) => {
  res.json({ data: 'protected resource', plan: req.user.plan });
});

Note that createPlanLimiter is called once at module load time, not on every incoming request. Creating a new RedisStore instance per request would create new Redis connections on each call and quickly exhaust your Redis connection pool. Pre-building one limiter per plan and selecting among them in tieredLimiter keeps Redis connections stable and bounded.

Step 11: Test Rate Limiting with curl and Jest

Rate limiting code is infrastructure code. A misconfigured limiter silently breaks all your authenticated users or silently lets attackers through without triggering any application errors. Only automated tests verify that limits behave correctly across all configurations.

Manual Verification with curl

Quick smoke test for the auth limiter. Send 6 failed login attempts when the limit is 5:

for i in $(seq 1 6); do
  echo "Attempt $i:"
  curl -s -X POST http://localhost:3000/api/auth/login \
    -H 'Content-Type: application/json' \
    -d '{"username":"wrong","password":"wrong"}' \
    -w "\nHTTP Status: %{http_code}\n"
  echo "---"
done

Expected output: attempts 1 through 5 print HTTP Status: 401 (invalid credentials). Attempt 6 prints HTTP Status: 429. The response body for attempt 6 will contain your configured error message JSON.

Automated Tests with Jest and supertest

Create src/__tests__/rate-limit.test.js:

const request = require('supertest');
const app = require('../app');

describe('Rate Limiting', () => {
  test('global limiter blocks after 100 requests', async () => {
    const responses = await Promise.all(
      Array.from({ length: 100 }, () => request(app).get('/api/products'))
    );
    expect(responses.every((r) => r.status === 200)).toBe(true);

    // Request 101 must be blocked
    const blocked = await request(app).get('/api/products');
    expect(blocked.status).toBe(429);
    expect(blocked.body.error).toBe('Too Many Requests');
  }, 30000);

  test('auth limiter blocks after 5 failed login attempts', async () => {
    const failedLogin = () =>
      request(app)
        .post('/api/auth/login')
        .send({ username: 'wrong', password: 'wrong' });

    for (let i = 0; i < 5; i++) {
      const res = await failedLogin();
      expect(res.status).toBe(401);
    }

    const blocked = await failedLogin();
    expect(blocked.status).toBe(429);
    expect(blocked.body.error).toBe('Too Many Authentication Attempts');
  }, 30000);

  test('successful logins do not count toward the auth limit', async () => {
    const successLogin = () =>
      request(app)
        .post('/api/auth/login')
        .send({ username: 'admin', password: 'secret' });

    for (let i = 0; i < 10; i++) {
      const res = await successLogin();
      expect(res.status).toBe(200);
    }
  }, 30000);

  test('standard rate limit headers are present on normal responses', async () => {
    const res = await request(app).get('/api/products');
    expect(res.status).toBe(200);
    expect(res.headers).toHaveProperty('ratelimit-limit');
    expect(res.headers).toHaveProperty('ratelimit-remaining');
    expect(res.headers).toHaveProperty('ratelimit-reset');
  });

  test('health endpoint is excluded from rate limiting', async () => {
    const responses = await Promise.all(
      Array.from({ length: 200 }, () => request(app).get('/health'))
    );
    expect(responses.every((r) => r.status === 200)).toBe(true);
  }, 30000);
});

Run the suite with npm test. The global limiter test sends 100 concurrent requests using Promise.all, which exercises the in-memory store’s state consistency under load. If you switch to the Redis store, these tests require a running Redis instance at the configured host and port. Use a separate test Redis database (select database 1 or 2 with db: 1 in the ioredis config) to avoid contaminating production data during test runs.

Step 12: Harden with Environment Variables and Monitoring

Hard-coded rate limit values in source code are difficult to tune across different environments. A development environment needs loose limits so automated test suites do not constantly hit 429 responses. Staging needs production-equivalent limits for realistic load testing. Production needs the tightest limits with full monitoring coverage. Move all configurable values to environment variables so you can adjust them without code changes or deployments.

Create src/config.js:

// src/config.js
module.exports = {
  rateLimit: {
    global: {
      windowMs: parseInt(process.env.RATE_LIMIT_WINDOW_MS, 10) || 15 * 60 * 1000,
      max:      parseInt(process.env.RATE_LIMIT_MAX, 10)       || 100,
    },
    auth: {
      windowMs: parseInt(process.env.AUTH_RATE_WINDOW_MS, 10)  || 15 * 60 * 1000,
      max:      parseInt(process.env.AUTH_RATE_MAX, 10)        || 5,
    },
    search: {
      windowMs: parseInt(process.env.SEARCH_RATE_WINDOW_MS, 10) || 60 * 1000,
      max:      parseInt(process.env.SEARCH_RATE_MAX, 10)       || 10,
    },
  },
  redis: {
    host: process.env.REDIS_HOST || 'localhost',
    port: parseInt(process.env.REDIS_PORT, 10) || 6379,
  },
};

For test environments, override the window sizes to something short enough to test within a normal CI run. Create a .env.test file (add it to .gitignore):

RATE_LIMIT_WINDOW_MS=5000
RATE_LIMIT_MAX=10
AUTH_RATE_WINDOW_MS=5000
AUTH_RATE_MAX=3

For monitoring, the structured JSON logs from your handler callbacks give you everything needed to set up a useful production alert. In a log aggregation system, create a rate metric for the count of rate_limit_exceeded events per minute on your auth endpoint and alert when that metric exceeds twice the normal baseline. Attacks often start at low volume (a few 429s per hour) before ramping up significantly. An early alert gives you time to block specific IP ranges at the infrastructure layer before the attack escalates. The Node.js project also explicitly recommends pairing application-level rate limiting with infrastructure-level controls (nginx, cloud firewalls, cloud load balancers) for larger and more critical systems, as documented in the Node.js security releases accompanying material.

5 Common Pitfalls to Avoid

These are the mistakes that appear most often in production rate limiting deployments. Each one quietly negates the protection you think you have put in place, and most of them produce no error or warning in development because the conditions that trigger them only exist in production infrastructure.

  1. Missing trust proxy configuration behind a reverse proxy. Deploying behind nginx or a cloud load balancer without app.set('trust proxy', 1) makes every request appear to come from the proxy’s IP. All users share a single rate limit bucket. The first client to send 100 requests trips the limit for every other user on the platform simultaneously. This is the most common and most damaging production failure for rate limiting. Verify correct behavior by logging req.ip and confirming you see actual client IPs from different network ranges.
  2. Using the in-memory store with multiple Node.js processes or instances. Each worker or server instance maintains an independent counter with the default memory store. A client who hits the limit on worker 1 immediately bypasses it by having the next request routed to worker 2. Fix: use @express-rate-limit/cluster-memory-store for Node.js cluster on one machine, or the Redis store for multiple separate server instances.
  3. Applying the same limit to all endpoints regardless of resource cost. A search endpoint that triggers a full-text database query costs orders of magnitude more than a health check. Applying a global 100-request limit to both either leaves expensive endpoints dangerously exposed or makes cheap endpoints restrictive for normal browsing patterns. Define separate limiters by endpoint group based on actual measured resource cost.
  4. Including health check and monitoring endpoints under the global limiter. Load balancers, container orchestrators (Kubernetes liveness probes), and monitoring agents hit /health endpoints continuously, often 10 to 30 times per minute. Including them under the global rate limiter means that during a traffic spike, your monitoring infrastructure starts receiving 429 responses, making health checks appear to fail and potentially triggering unnecessary container restarts or alert pages. Register the health check route before the global limiter middleware, or use skip: (req) => req.path === '/health'.
  5. Omitting skipSuccessfulRequests on authentication endpoints. Without this option, a user who legitimately logs in, logs out, and logs back in 5 times in an afternoon trips the rate limit even though they have done nothing harmful. The option ensures only failed authentication attempts count against the quota: penalize repeated failures, not successful usage. This is the correct semantic for login protection.

Troubleshooting Guide

When rate limiting does not behave as expected, the root cause is almost always one of four things: IP resolution, middleware ordering, store configuration, or test isolation. Work through this list systematically before making changes to rate limit values.

  1. All users get rate limited after the first client hits the limit. Root cause: trust proxy is not configured, so Express sees the reverse proxy’s IP for every request. Fix: add app.set('trust proxy', 1) before any middleware registration and verify with console.log(req.ip) that you see real client IPs from different network ranges in your logs.
  2. The rate limiter has no effect, even after thousands of requests. Root cause: the limiter middleware is registered after the route handler that responds to the request, so the route sends a response before the limiter can check the count. Fix: move the limiter registration above the route or router definition it is protecting, or pass it as the first argument to a specific route handler.
  3. Redis connection errors prevent the application from starting. Root cause: ioredis throws an unhandled error when enableReadyCheck is true and Redis is unavailable at startup. Fix: add an error event listener on the Redis client (redisClient.on('error', handler)) to prevent unhandled rejection crashes. For development environments where Redis is optional, add a fallback to the in-memory store when the connection fails.
  4. Rate limits from one test case bleed into the next test case. Root cause: the in-memory store persists across test cases when the same app module instance is reused within a test file. Fix: call limiter.resetKey(ip) in a beforeEach hook, use a fresh app instance per test file, or set a very short windowMs in the test environment so limits expire naturally between tests.
  5. The Retry-After header is missing from 429 responses. Root cause: standardHeaders: 'draft-7' emits RateLimit-Reset (a Unix timestamp) rather than the Retry-After header. The Retry-After header is not emitted automatically by the draft-7 standard. Fix: add res.set('Retry-After', String(Math.ceil(options.windowMs / 1000))) explicitly in your custom handler callback.
  6. Authenticated users from the same company all share one rate limit bucket. Root cause: a corporate office where hundreds of employees share a single public NAT IP address. An IP-based limit of 100 per 15 minutes means 100 collective requests from the entire company trip the limit. Fix: switch the keyGenerator to return the authenticated user ID or API key for authenticated routes, falling back to IP only for unauthenticated access.
  7. Rate limits work locally but fail silently in production or CI. Root cause: different trust proxy settings, different store types (in-memory locally, Redis in CI), or missing X-Forwarded-For forwarding in the nginx configuration. Fix: standardize all configuration via environment variables, ensure CI runs against a Redis instance with the same prefix configuration as production, and add a test that verifies req.ip reflects the expected value under the proxy setup.
  8. Internal services like monitoring agents are getting rate limited. Root cause: the global limiter applies to all routes including those only called by internal infrastructure. Fix: use the skip function option to bypass the limiter for known internal IP ranges or specific paths: skip: (req) => req.ip === '127.0.0.1' || req.path.startsWith('/internal'). Alternatively, register internal routes before the global limiter middleware.

Advanced Tips

Use rate-limiter-flexible for Complex Scenarios

The rate-limiter-flexible package handles scenarios that express-rate-limit cannot address cleanly: points-based limits where each endpoint has a different consumption weight (a search costs 5 points, a read costs 1 point, a write costs 10 points), simultaneous multi-window limits (such as 10 per second and 1,000 per hour enforced in parallel), progressive penalties that increase the block duration after repeated violations, and transparent in-memory fallback when Redis becomes unavailable. If your API billing model or fair-use policy is more complex than a simple per-endpoint count, rate-limiter-flexible is worth the additional setup. It supports Redis, MongoDB, MySQL, and PostgreSQL as backends, making it the right choice for teams that already run a relational database but do not operate Redis.

Layer Application and Infrastructure Rate Limiting

For high-traffic production systems, application-level rate limiting in Express complements infrastructure-level rate limiting but does not replace it. Cloudflare, AWS API Gateway, nginx, and HAProxy all provide rate limiting that operates before traffic reaches your Node.js process. Infrastructure-level limits absorb volumetric attacks before they consume any application CPU or database resources. Application-level limits enforce business logic (plan quotas, per-user fairness, endpoint-specific costs) that infrastructure layers cannot implement without knowledge of your authentication model and route semantics. Running both layers means an attack that saturates one is still caught by the other, and a configuration mistake in one layer does not fully disable protection.

Use Dynamic Key Generators for Fine-Grained Control

The keyGenerator function in express-rate-limit can return any string, which gives you complete flexibility over how rate limit buckets are defined. Some useful patterns: key by req.user.id + ':' + req.path to give each user a separate quota per endpoint, key by req.headers['x-tenant-id'] to enforce per-tenant limits in multi-tenant applications, or key by req.headers['x-api-key'] to tie limits to API keys rather than the IP address of the machine making the request. Combine this with the Redis store so that the custom key is enforced consistently across all your Node.js workers and instances.

Rate limiting is one layer in a complete Node.js security stack. These articles cover the adjacent layers:

Frequently Asked Questions

What is the difference between rate limiting and throttling?

Rate limiting blocks requests that exceed a configured quota with a hard 429 Too Many Requests response. Throttling delays requests progressively as a client approaches the limit but still serves them, just more slowly. The express-rate-limit package implements hard rate limiting. The express-slow-down package implements progressive throttling by adding increasing delays to response times. Using both together provides layered defense: throttling degrades scrapers and bots gradually, while the hard limit enforces the absolute ceiling and stops runaway clients completely.

Should I rate limit by IP address or by user account?

Both, applied to different route groups. IP-based limits protect unauthenticated endpoints and catch attacks before authentication runs. User ID or API-key-based limits protect authenticated endpoints and avoid penalizing all employees of a company that exits through a single corporate NAT IP address. Use IP for public and anonymous routes. Switch the keyGenerator option to return the authenticated user ID or API key once a request has been authenticated, so that one user’s heavy traffic does not affect another user’s quota.

How do I reset a specific IP’s rate limit count?

Call limiter.resetKey(ip) where ip is the client’s IP address string. This immediately clears the counter for that key. For the Redis store, this deletes the corresponding key in Redis, so the reset takes effect across all app instances simultaneously. This method is useful for admin tooling where you need to manually unblock a customer who was legitimately rate limited (for example, after a support interaction confirms the client was not abusing the API).

Does rate limiting work with Node.js cluster mode?

Not with the default in-memory store. Each cluster worker maintains an independent counter, so a client can bypass limits by having requests routed to different workers. To enforce limits across all workers on a single machine, install and configure @express-rate-limit/cluster-memory-store as the store option. For multiple separate machines (horizontal scaling), use the Redis store with rate-limit-redis and a shared Redis instance. Both options are covered in the official express-rate-limit documentation.

What HTTP status code should a rate-limited response return?

429 Too Many Requests, as defined in RFC 6585. The response should include a Retry-After header indicating the number of seconds the client should wait before retrying, along with a JSON body that identifies the error type and limit that was exceeded. The standardHeaders: 'draft-7' option emits the RateLimit-* header family on all responses so clients can monitor their quota proactively. The Retry-After header on the actual 429 response requires explicit handling in the handler callback, as it is not emitted automatically by the draft-7 standard.

Is rate limiting enough to stop DDoS attacks?

No. Application-layer rate limiting in Express is effective against targeted abuse from a small number of IP addresses and against brute-force attempts from individual clients. Large-scale volumetric DDoS attacks involve thousands or millions of unique IP addresses sending traffic at rates far beyond what any Node.js process can even parse and reject. For volumetric DDoS protection, you need infrastructure-layer defenses: CDN-level traffic absorption (Cloudflare, AWS Shield, Akamai), anycast routing, or dedicated DDoS mitigation services. Rate limiting in Express is your last line of defense, not your first.

How do I test rate limiting without waiting for the window to expire?

Set a very short windowMs in your test environment using environment variables, for example 1,000 milliseconds or 5,000 milliseconds. This lets you trigger, observe, and verify rate limit expiration within the timeframe of a normal CI run. Between test cases, call limiter.resetKey(ip) to clear counters instantly without waiting for the window to expire. For Redis store tests, spin up a dedicated test Redis instance using Docker (docker run -d -p 6379:6379 redis:alpine) and use a unique key prefix per test run to prevent state from bleeding between test files.