Pattern 17 — ThreadPoolExecutor for rate-limited fetchers, when async is overkill

# Pattern 17 — ThreadPoolExecutor for rate-limited fetchers, when async is overkill

## The pain

You need to call an external API 285 times to fetch funding rates for every perpetual contract on OKX. The exchange rate-limits public endpoints to 10 requests/second. You can either:

1. **Sequential**: 285 calls × 100 ms each = 28.5 seconds. Way too slow for a 5-minute refresh cycle that has 19 other exchanges to hit.
2. **Async with `aiohttp` + `asyncio.gather`**: fast, but now you've pulled in `aiohttp`, you've rewritten your code in `async def`, your test suite needs `pytest-asyncio`, your stack traces are 80% framework noise, and you're debugging "RuntimeError: This event loop is already running" during interactive REPL sessions.
3. **`concurrent.futures.ThreadPoolExecutor` with `requests`**: 8 worker threads, 8 seconds end-to-end, zero new dependencies, your code stays synchronous, your tests stay synchronous, your stack traces stay readable.

For I/O-bound workloads with a known concurrency cap, option 3 is almost always the right call. Async is for when you have *thousands* of concurrent connections per process and need fine-grained scheduling. For "fan out 285 HTTP calls under a 10 req/s ceiling", threads are the boring, working answer.

## When to use it

- You have N independent I/O calls (HTTP, DB, file system) that can be parallelized.
- The total concurrency is bounded (≤ ~50 threads in flight at once is a good ceiling).
- You're inside a synchronous codebase and don't want to convert it to async.
- The downstream rate limit is the constraint, not CPU.

## The code

```python
from concurrent.futures import ThreadPoolExecutor, as_completed
import requests

session = requests.Session()
session.headers.update({"User-Agent": "myapp/1.0"})

def fetch_one(inst_id: str) -> dict | None:
    """Fetch one instrument's funding rate. Returns None on any failure."""
    try:
        r = session.get(
            "https://www.okx.com/api/v5/public/funding-rate",
            params={"instId": inst_id},
            timeout=10,
        )
        if r.status_code != 200:
            return None
        rows = r.json().get("data", [])
        return rows[0] if rows else None
    except Exception:
        return None

def fetch_all(inst_ids: list[str], workers: int = 8) -> list[dict]:
    """Fan out fetches with `workers` threads. Returns successful results in
    completion order (not input order)."""
    results = []
    with ThreadPoolExecutor(max_workers=workers) as pool:
        futures = {pool.submit(fetch_one, inst): inst for inst in inst_ids}
        for fut in as_completed(futures):
            r = fut.result()
            if r is not None:
                results.append(r)
    return results
```

That's it. Pass a list of N work items, get back a list of results, with up to `workers` calls in flight at any moment.

## Picking the worker count

The right number is **just below the rate limit divided by the per-call latency**.

- OKX public rate limit: ~20 req per 2 seconds = 10 req/s.
- Per-call latency: ~150 ms median (Europe → APAC HTTP).
- Theoretical concurrency to saturate the limit: 10 req/s × 0.15 s = 1.5 in-flight.
- Practical concurrency with bursts: 8 threads = comfortable margin under the limit even if a few calls take 500+ ms.

The general rule: **start with `workers = (rate_limit_per_sec × p95_latency_seconds) × 2` and round up to a small number like 4 or 8**. Then verify with the API's response headers (most exchanges expose `X-RateLimit-Remaining`).

If you go too high and start getting 429s, just lower the worker count. There's no clever solution — just fewer threads.

## When NOT to use it

- **Your work is CPU-bound.** Threads in Python don't help with CPU-bound work because of the GIL. Use `ProcessPoolExecutor` or just call NumPy.
- **You need millions of concurrent connections.** Threads cost ~1 MB of stack each. At 10k threads you've burned 10 GB of RAM and the OS thread scheduler is choking. Use async.
- **You need cross-task coordination beyond simple fan-out/fan-in.** If you find yourself building a pipeline of "fetch → transform → push to next stage", look at `asyncio` or a message queue. Threads don't compose well past the simplest patterns.
- **The downstream is sequential anyway.** If your "external API" is actually a single SQLite database write, threading just adds lock contention.

## Error handling that actually works

The naive version above swallows all errors inside `fetch_one` and returns `None`, which is fine for the "best effort, drop failures" use case. For more visibility:

```python
from collections import Counter

def fetch_all_with_stats(inst_ids: list[str], workers: int = 8) -> tuple[list[dict], Counter]:
    results = []
    failures = Counter()
    with ThreadPoolExecutor(max_workers=workers) as pool:
        futures = {pool.submit(fetch_one, inst): inst for inst in inst_ids}
        for fut in as_completed(futures):
            inst = futures[fut]
            try:
                r = fut.result()
            except Exception as e:
                failures[type(e).__name__] += 1
                continue
            if r is None:
                failures["empty_response"] += 1
            else:
                results.append(r)
    return results, failures
```

Now you can log `len(results)` vs `dict(failures)` and notice if the failure rate creeps above some threshold.

## Real numbers from my Funding Finder collector

The OKX fetcher in production:
- **285 instruments per cycle**, fetched every 5 minutes
- **8 worker threads**
- **~8 seconds end-to-end** (~28 ms wall-clock per instrument including all the overhead)
- **0 dependencies on async** — the codebase is synchronous Flask + synchronous requests + synchronous SQLite
- **Failure rate** measured over 10 days: < 0.1% (transient TCP resets that disappear on next cycle)

The async version of this would be ~30% faster end-to-end (5 sec vs 8 sec) but would require: `aiohttp`, an async test harness, `asyncio.run()` boilerplate, and an `async def` rewrite of every fetcher in the file. Not worth the 3 seconds.

## Further reading

- The `concurrent.futures` documentation — https://docs.python.org/3/library/concurrent.futures.html
- David Beazley, "Generators, Coroutines, Native Coroutines and async/await" — explains *why* async exists and when it actually beats threads. The TL;DR: when you have ≥10k concurrent I/O operations per process. https://www.dabeaz.com/coroutines/
- "Python's GIL is hurting me, do I need to use async?" (almost always: no) — https://lukasa.co.uk/2024/03/Python_GIL/

## The summary

For "fan out 100-500 HTTP calls under a rate limit", `ThreadPoolExecutor` is the boring answer. 8 lines of code, zero new deps, your codebase stays synchronous, your tests stay synchronous, and your stack traces stay readable. Async is for when you need 10,000+ concurrent connections per process. Most projects never get there. Don't pre-optimize.