3 Comments
User's avatar
Neural Foundry's avatar

The concurrency design here is really smart especially batching GeoIP lookups instead of hitting APIs individually. 500 concurrent with uvloop getting 104k proxies done in 11 mins is solid throughput. I've seen similar scanners choke at around 50-100 concurrent becuase they dont handle backpressure well, curious how the semaphore limits are tuned here to avoid overwhelming the event loop.

Expand full comment
Bob Bragg's avatar

But did they use THE AI GODS

Expand full comment
Bob Bragg's avatar

The semaphore isn’t finely tuned to protect the event loop itself — it simply caps how many full proxy scans can run at once. Each scan is mostly waiting on network I/O (connects, reads, HTTP), so even at 500 concurrency very little Python code is running at any given time.

That keeps the loop responsive. Fast-fail paths, short timeouts, and linear retry backoff prevent retry storms, which is where a lot of scanners fall over around 50–100 concurrent. GeoIP doesn’t become a bottleneck because it’s cheap or batched, so it never adds per-proxy API pressure. uvloop helps with scheduling overhead, but the stability mostly comes from limiting whole scan lifecycles rather than trying to throttle every individual step.

Expand full comment