# Pattern 17 — ThreadPoolExecutor for rate-limited fetchers, when async is overkill
## The pain
You need to call an external API 285 times to fetch funding rates for every perpetual contract on OKX. The exchange rate-limits public endpoints to 10 requests/second. You can either:
1. **Sequential**: 285 calls × 100 ms each = 28.5 seconds. Way too slow for a 5-minute refresh cycle that has 19 other exchanges to hit.
2. **Async with `aiohttp` + `asyncio.gather`**: fast, but now you've pulled in `aiohttp`, you've rewritten your code in `async def`, your test suite needs `pytest-asyncio`, your stack traces are 80% framework noise, and you're debugging "RuntimeError: This event loop is already running" during interactive REPL sessions.
3. **`concurrent.futures.ThreadPoolExecutor` with `requests`**: 8 worker threads, 8 seconds end-to-end, zero new dependencies, your code stays synchronous, your tests stay synchronous, your stack traces stay readable.
For I/O-bound workloads with a known concurrency cap, option 3 is almost always the right call. Async is for when you have *thousands* of concurrent connections per process and need fine-grained scheduling. For "fan out 285 HTTP calls under a 10 req/s ceiling", threads are the boring, working answer.
## When to use it
- You have N independent I/O calls (HTTP, DB, file system) that can be parallelized.
- The total concurrency is bounded (≤ ~50 threads in flight at once is a good ceiling).
- You're inside a synchronous codebase and don't want to convert it to async.
- The downstream rate limit is the constraint, not CPU.
## The code
```python
from concurrent.futures import ThreadPoolExecutor, as_completed
import requests
session = requests.Session()
session.headers.update({"User-Agent": "myapp/1.0"})
def fetch_one(inst_id: str) -> dict | None:
"""Fetch one instrument's funding rate. Returns None on any failure."""
try:
r = session.get(
"https://www.okx.com/api/v5/public/funding-rate",
params={"instId": inst_id},
timeout=10,
)
if r.status_code != 200:
return None
rows = r.json().get("data", [])
return rows[0] if rows else None
except Exception:
return None
def fetch_all(inst_ids: list[str], workers: int = 8) -> list[dict]:
"""Fan out fetches with `workers` threads. Returns successful results in
completion order (not input order)."""
results = []
with ThreadPoolExecutor(max_workers=workers) as pool:
futures = {pool.submit(fetch_one, inst): inst for inst in inst_ids}
for fut in as_completed(futures):
r = fut.result()
if r is not None:
results.append(r)
return results
```
That's it. Pass a list of N work items, get back a list of results, with up to `workers` calls in flight at any moment.
## Picking the worker count
The right number is **just below the rate limit divided by the per-call latency**.
- OKX public rate limit: ~20 req per 2 seconds = 10 req/s.
- Per-call latency: ~150 ms median (Europe → APAC HTTP).
- Theoretical concurrency to saturate the limit: 10 req/s × 0.15 s = 1.5 in-flight.
- Practical concurrency with bursts: 8 threads = comfortable margin under the limit even if a few calls take 500+ ms.
The general rule: **start with `workers = (rate_limit_per_sec × p95_latency_seconds) × 2` and round up to a small number like 4 or 8**. Then verify with the API's response headers (most exchanges expose `X-RateLimit-Remaining`).
If you go too high and start getting 429s, just lower the worker count. There's no clever solution — just fewer threads.
## When NOT to use it
- **Your work is CPU-bound.** Threads in Python don't help with CPU-bound work because of the GIL. Use `ProcessPoolExecutor` or just call NumPy.
- **You need millions of concurrent connections.** Threads cost ~1 MB of stack each. At 10k threads you've burned 10 GB of RAM and the OS thread scheduler is choking. Use async.
- **You need cross-task coordination beyond simple fan-out/fan-in.** If you find yourself building a pipeline of "fetch → transform → push to next stage", look at `asyncio` or a message queue. Threads don't compose well past the simplest patterns.
- **The downstream is sequential anyway.** If your "external API" is actually a single SQLite database write, threading just adds lock contention.
## Error handling that actually works
The naive version above swallows all errors inside `fetch_one` and returns `None`, which is fine for the "best effort, drop failures" use case. For more visibility:
```python
from collections import Counter
def fetch_all_with_stats(inst_ids: list[str], workers: int = 8) -> tuple[list[dict], Counter]:
results = []
failures = Counter()
with ThreadPoolExecutor(max_workers=workers) as pool:
futures = {pool.submit(fetch_one, inst): inst for inst in inst_ids}
for fut in as_completed(futures):
inst = futures[fut]
try:
r = fut.result()
except Exception as e:
failures[type(e).__name__] += 1
continue
if r is None:
failures["empty_response"] += 1
else:
results.append(r)
return results, failures
```
Now you can log `len(results)` vs `dict(failures)` and notice if the failure rate creeps above some threshold.
## Real numbers from my Funding Finder collector
The OKX fetcher in production:
- **285 instruments per cycle**, fetched every 5 minutes
- **8 worker threads**
- **~8 seconds end-to-end** (~28 ms wall-clock per instrument including all the overhead)
- **0 dependencies on async** — the codebase is synchronous Flask + synchronous requests + synchronous SQLite
- **Failure rate** measured over 10 days: < 0.1% (transient TCP resets that disappear on next cycle)
The async version of this would be ~30% faster end-to-end (5 sec vs 8 sec) but would require: `aiohttp`, an async test harness, `asyncio.run()` boilerplate, and an `async def` rewrite of every fetcher in the file. Not worth the 3 seconds.
## Further reading
- The `concurrent.futures` documentation — https://docs.python.org/3/library/concurrent.futures.html
- David Beazley, "Generators, Coroutines, Native Coroutines and async/await" — explains *why* async exists and when it actually beats threads. The TL;DR: when you have ≥10k concurrent I/O operations per process. https://www.dabeaz.com/coroutines/
- "Python's GIL is hurting me, do I need to use async?" (almost always: no) — https://lukasa.co.uk/2024/03/Python_GIL/
## The summary
For "fan out 100-500 HTTP calls under a rate limit", `ThreadPoolExecutor` is the boring answer. 8 lines of code, zero new deps, your codebase stays synchronous, your tests stay synchronous, and your stack traces stay readable. Async is for when you need 10,000+ concurrent connections per process. Most projects never get there. Don't pre-optimize.