← Back to Chapters

Python Concurrency – Multiprocessing

? Python Concurrency – Multiprocessing

⚡ Quick Overview

Python’s multiprocessing module allows you to run code in multiple processes, taking advantage of multiple CPU cores. Unlike threading, which is limited by the Global Interpreter Lock (GIL) for CPU-bound tasks, multiprocessing starts separate Python interpreter processes that can truly execute in parallel.

Use multiprocessing when you want to speed up heavy CPU work such as:

  • Image processing or video encoding.
  • Mathematical simulations or numerical analysis.
  • Data processing on large files or datasets.
  • Any CPU-intensive loop that can be split into independent chunks.

? Key Concepts

  • Process: An independent Python interpreter with its own memory space.
  • GIL (Global Interpreter Lock): Prevents true parallel CPU execution with threads, but not with processes.
  • Process class: Low-level API to start and manage processes manually.
  • Pool class: Higher-level API to run a function on many inputs in parallel.
  • IPC (Inter-Process Communication): Techniques like Queue, Pipe, or shared memory to exchange data between processes.
  • Pickling: Arguments and results are usually serialized (pickled) to be sent between processes.
  • if __name__ == "__main__": Required guard on Windows to avoid infinite process spawning.

? Syntax and Building Blocks

⚙️ Creating a Process

  • from multiprocessing import Process
  • p = Process(target=func, args=(arg1, arg2))
  • p.start() – starts the new process.
  • p.join() – waits for the process to finish.

? Using a Pool of Workers

  • from multiprocessing import Pool
  • with Pool(processes=n) as pool:
  • results = pool.map(func, iterable)
  • Ideal for “apply the same function to many items” patterns.

? Sharing Data

  • Queue – safe FIFO queue shared between processes.
  • Pipe – two-way communication channel.
  • Value, Array, Manager – share data structures safely.

? Important Utilities

  • multiprocessing.cpu_count() – number of available cores.
  • set_start_method() – controls how new processes are spawned.
  • Start methods: spawn, fork, forkserver (platform dependent).

? Code Examples

? Run a simple function in a separate process
from multiprocessing import Process
import time


def worker(name, delay):
    print(f"[{name}] starting work")
    time.sleep(delay)
    print(f"[{name}] finished after {delay} seconds")


if __name__ == "__main__":
    p1 = Process(target=worker, args=("Process-1", 2))
    p2 = Process(target=worker, args=("Process-2", 3))

    p1.start()  # run in parallel
    p2.start()

    p1.join()   # wait for p1 to finish
    p2.join()

    print("All processes completed")
? Use a Pool to parallelize a CPU-heavy function
from multiprocessing import Pool, cpu_count
import math
import time


# Check if a number is prime
def is_prime(n: int) -> bool:
    if n < 2:
        return False
    if n % 2 == 0 and n != 2:
        return False
    limit = int(math.sqrt(n)) + 1
    for i in range(3, limit, 2):
        if n % i == 0:
            return False
    return True


if __name__ == "__main__":
    # Large numbers to test for primality
    numbers = [10_000_019, 10_000_033, 10_000_079, 10_000_081]

    print(f"CPU cores available: {cpu_count()}")

    start = time.perf_counter()
    # Use a Pool to distribute work across processes
    with Pool() as pool:
        results = pool.map(is_prime, numbers)
    end = time.perf_counter()

    for n, r in zip(numbers, results):
        print(f"{n} prime? {r}")

    print(f"Completed in {end - start:.3f} seconds using multiprocessing")
? Share results using a Queue
from multiprocessing import Process, Queue


def square_worker(numbers, queue):
    for n in numbers:
        queue.put((n, n * n))
    queue.put(None)  # sentinel value to mark end


if __name__ == "__main__":
    nums = [1, 2, 3, 4, 5]
    q = Queue()

    p = Process(target=square_worker, args=(nums, q))
    p.start()

    while True:
        item = q.get()
        if item is None:
            break
        n, sq = item
        print(f"{n} squared is {sq}")

    p.join()
    print("Done reading from queue")
⏱ Compare sequential vs multiprocessing runtime (concept)
# Pseudocode / structure:
# 1. Define a CPU-heavy function.
# 2. Run it in a simple for-loop (sequential), measure time.
# 3. Run it again using a Pool.map, measure time.
# 4. Compare results to see speedup on multi-core CPUs.

? Live Output / Explanation

What is happening in these multiprocessing examples?

  1. Basic Process example:
    • Two processes (p1 and p2) run the worker() function independently.
    • They both sleep for different durations, but run in parallel.
    • join() ensures the main process waits for them before printing the final message.
  2. Pool.map() example:
    • The is_prime() function checks whether a number is prime.
    • Pool.map() distributes the list of numbers across worker processes.
    • Each process checks different numbers in parallel, reducing total time.
  3. Queue communication example:
    • The worker process computes squares of numbers and puts results into a shared queue.
    • The main process reads from the queue and prints as results arrive.
    • A sentinel value (None) indicates that there is no more data.
  4. Sequential vs multiprocessing idea:
    • Sequential execution uses one core — tasks run one after another.
    • Multiprocessing distributes work across cores — tasks run simultaneously.
    • On multi-core CPUs, CPU-heavy workloads often finish faster with multiprocessing.

✅ Writing Robust Multiprocessing Code

  • Always protect the entry point with if __name__ == "__main__":, especially on Windows.
  • Use Pool for simple “map this function over a list” style problems.
  • Ensure that functions and arguments passed to processes are picklable (no open file handles, etc.).
  • Avoid sharing too much data — sending large objects between processes can be slow.
  • Use Queue or Manager for coordinated communication between processes.
  • Limit the number of processes: more is not always better than cpu_count().
  • Use multiprocessing for CPU-bound tasks; for I/O-bound tasks, threads or async I/O may be simpler.

? Practice Tasks with Python Multiprocessing

  1. Prime checker benchmark:
    Implement a prime-checking program that runs sequentially and then using Pool.map(). Compare the execution times for a large list of numbers.
  2. Image processing pipeline:
    Use multiprocessing to apply a simple transformation (e.g., resize or grayscale) to many image files in a folder.
  3. Word count in large files:
    Split a large text file into chunks and use multiple processes to count words in each chunk, then combine results.
  4. Queue-based logger:
    Implement a worker that performs calculations and sends log messages to the main process via a Queue, where they are written to a single log file.
  5. CPU usage visualizer (conceptual):
    Write a script that starts one process per core, each running a CPU-heavy loop, and monitor CPU usage using your OS tools.

Run your scripts from the command line with: python multiprocessing_demo.py

? Common Use Cases

  • Speeding up data science or ETL pipelines on multi-core machines.
  • Parallelizing CPU-heavy algorithms in simulations and scientific computing.
  • Batch processing of media files (images, audio, video).
  • Server-side tasks that need to utilize all cores (e.g., background jobs, report generation).
  • Any workload where tasks are independent and can be safely split across processes.