The Difference Between Concurrency and Parallelism — clear & concise

September 1st, 20254 min read#dev #concurrency #parallelism #performance #async #ca-duh

Concurrency is about dealing with lots of things at once; parallelism is about doing lots of things at once. Here’s the mental model, practical examples, and when to use each.

TL;DR

Concurrency = structure: multiple tasks are in progress at the same time (often interleaved on one CPU). Great for I/O-bound work.
Parallelism = execution: multiple tasks are executing simultaneously on different cores. Great for CPU-bound work.
You can have concurrency without parallelism (event loop) and parallelism without concurrency (single big vectorized task). Many systems use both.
Choose async I/O / event loops for high-latency waits (HTTP/DB), and threads/processes/workers for heavy CPU. Limit concurrency to avoid overload.

One-minute mental model

Time ─────────────────────────────────────────────────────────────

Concurrency (interleaving on one core)
Task A: ███      ███      ███      ███
Task B:    ███      ███      ███
Task C:       ███      ███      ███

Parallelism (multi-core, truly simultaneous)
Core 1: ███████████████████  Task A
Core 2: ███████████████████  Task B
Core 3: ███████████████████  Task C

Concurrency is how you structure a program to make progress on multiple tasks by switching between them when one would otherwise wait.
Parallelism is about using more hardware (cores/threads/GPUs) to run tasks at the same moment.

Where each shines

| Workload | Concurrency (async/evented) | Parallelism (multi-core/GPU) | |---|---|---| | Serve thousands of HTTP requests | ✅ Excellent | ⚠️ Unnecessary unless handlers are CPU-heavy | | Fetch many APIs / DB queries | ✅ Excellent | ⚠️ Network-bound; parallel cores don’t help much | | Image/video encoding, ML inference | ⚠️ Limited | ✅ Use threads, processes, SIMD, GPU | | Data transforms over large arrays | ⚠️ Often | ✅ Parallel map/reduce, vectorization | | Background jobs & pipelines | ✅ Orchestrate many waits | ✅ Parallel-execute CPU steps |

JavaScript examples

Concurrency (event loop, async I/O):

async function fetchAll(ids){
  // Start first, await together
  const tasks = ids.map(id => fetch(`/api/items/${id}`).then(r => r.json()));
  return Promise.all(tasks);
}

Parallelism (Web Workers for CPU-bound work):

// main.js
const worker = new Worker("worker.js");
worker.postMessage({ n: 50_000_000 });
worker.onmessage = (e) => console.log("sum:", e.data);

// worker.js
self.onmessage = (e) => {
  let s = 0; for (let i = 0; i < e.data.n; i++) s += i;
  postMessage(s);
};

JS is single-threaded in the main thread; use Workers (or server-side clusters) for CPU parallelism.

Python examples

Concurrency (asyncio for I/O):

import asyncio, aiohttp

async def fetch(session, url):
    async with session.get(url) as r:
        return await r.text()

async def main(urls):
    async with aiohttp.ClientSession() as s:
        tasks = [asyncio.create_task(fetch(s, u)) for u in urls]
        return await asyncio.gather(*tasks)

asyncio.run(main(["/a","/b","/c"]))

Parallelism for CPU (GIL-aware):

from concurrent.futures import ProcessPoolExecutor
def cpu_heavy(n): return sum(i*i for i in range(n))

with ProcessPoolExecutor() as pool:
    results = list(pool.map(cpu_heavy, [10_000_000]*4))  # runs on multiple cores

Python’s GIL limits CPU-bound threads; use processes (or native extensions/NumPy/Cython) for parallel CPU work. For blocking I/O in threads, concurrent.futures.ThreadPoolExecutor is fine.

Go quick glance

Concurrency (goroutines + channels):

func main() {
  ch := make(chan int)
  for i := 0; i < 3; i++ {
    go func(v int){ ch <- v*v }(i)
  }
  for i := 0; i < 3; i++ { fmt.Println(<-ch) }
}

Go’s runtime multiplexes goroutines over threads. With multiple cores (GOMAXPROCS>1), many goroutines run in parallel too.

Pitfalls & fixes

| Problem | Why it hurts | Fix | |---|---|---| | await in tight loops (JS/Python) | Serializes work | Start tasks first; await together (Promise.all, gather) | | Blocking CPU on event loop | Starvation, timeouts | Offload to threads/processes/workers | | Too much parallelism | Thrashing, rate limits | Bound concurrency (semaphores/pools/queues) | | Data races (threads) | Corruption, heisenbugs | Use immutable data, message passing, or proper locks/atomics | | Shared mutable global state | Hard to reason/test | Prefer pure functions; inject dependencies | | Forgetting timeouts/cancellation | Hangs & resource leaks | Timeouts, AbortController, asyncio.timeout, cancellation tokens |

How to choose (decision guide)

Is the task mostly waiting on I/O? → Concurrency (async/evented) first.
Is the task heavy CPU? → Parallelism (threads/processes/GPU).
Is it a mix? → Orchestrate with concurrency; offload hot CPU parts to parallel workers.
Running in serverless? → Concurrency saves cold-start cost; parallelism limited by per-function CPU quota.

Practical patterns

JS: event loop for I/O + Web Workers/worker threads for CPU sections.
Python: asyncio for I/O; ProcessPoolExecutor / multiprocessing (or vectorized libs) for CPU.
Go: goroutines everywhere; tune GOMAXPROCS and use worker pools for CPU-heavy parts.
Queues: use a job queue (SQS, RabbitMQ, Celery, BullMQ) to control fan-out and backpressure.

Quick checklist

[ ] Classify tasks: I/O-bound vs CPU-bound.
[ ] Use async I/O to raise throughput when waiting.
[ ] Add worker pools for CPU; cap concurrency.
[ ] Always set timeouts and support cancellation.
[ ] Measure: event loop lag, p95 latency, CPU %, run queue, GC pressure.
[ ] Keep code race-safe: prefer message passing or immutable data structures.

One-minute adoption plan

Identify top I/O waits and CPU hotspots (profiling).
Convert I/O paths to async/await; group awaits.
Introduce bounded worker pools for CPU (threads/processes/workers).
Add timeouts, cancellation, and backpressure.
Monitor & tune: throughput vs. latency vs. cost.