? Python Profiling & Optimization

⚡ Quick Overview

Python profiling and optimization is about measuring where your program spends time or memory and then improving those slow or heavy parts without changing the overall behavior of the code.

Profiling answers the question: "Where is my program slow?"
Optimization is the next step: "How can I make these parts faster or lighter?"
Always measure before and after optimization to verify the improvement.
Tools like cProfile, timeit, and line profilers help you find bottlenecks.

? Key Concepts

Profiling – Collecting statistics about function calls (how often and how long).
Bottleneck – A slow part of the code that limits overall performance.
CPU time vs Wall time – CPU time is actual compute time; wall time is real clock time.
Big-O Complexity – Describes how runtime grows with input size (e.g., O(n), O(n²)).
Micro-optimizations – Tiny low-level tweaks; useful only after fixing algorithmic issues.
Algorithmic optimization – Choosing a better data structure or algorithm (usually biggest gains).

? Syntax and Theory

? cProfile – built-in function-level profiler

cProfile gives you per-function statistics: number of calls and time spent.

Use from command line: python -m cProfile your_script.py
Use in code with cProfile.run() or Profile objects.

⏱️ timeit – micro-benchmarking tool

timeit is used for timing small snippets of code accurately, by running them many times and averaging the result.

Typical workflow:

Write a correct solution first.
Profile to find bottlenecks (do not guess).
Optimize the critical parts (data structure, algorithm, or implementation).
Profile again to confirm the improvement.

? Code Examples

? Example: Using cProfile to Find a Bottleneck

Below, we have an intentionally slow function that checks for duplicates using a nested loop. We will profile it with cProfile.

? View cProfile Example

import cProfile
import random


def has_duplicates_slow(items):
    # O(n^2) approach – compare every pair
    n = len(items)
    for i in range(n):
        for j in range(i + 1, n):
            if items[i] == items[j]:
                return True
    return False


def main():
    data = [random.randint(1, 10_000) for _ in range(50_000)]
    print("Has duplicates?", has_duplicates_slow(data))


if __name__ == "__main__":
    cProfile.run("main()", sort="tottime")

? Example: Optimizing with a Better Data Structure

The same task can be solved faster using a set (average O(1) membership check), turning the algorithm roughly into O(n).

? View Optimized Version

def has_duplicates_fast(items):
    # Faster O(n) duplicate check using a set
    seen = set()
    for value in items:
        if value in seen:
            return True
        seen.add(value)
    return False

⏱️ Example: Comparing Two Implementations with timeit

Use timeit from the standard library to compare the slow and fast versions.

? View timeit Example

from timeit import timeit
import random


def build_data():
    # Generate a list of random integers
    return [random.randint(1, 10_000) for _ in range(20_000)]


data = build_data()

# Time the slow and fast implementations
slow_time = timeit("has_duplicates_slow(data)", globals=globals(), number=3)
fast_time = timeit("has_duplicates_fast(data)", globals=globals(), number=3)

# Display the timing results
print(f"Slow version: {slow_time:.4f} seconds")
print(f"Fast version: {fast_time:.4f} seconds")

? Example: Manual Timing with time.perf_counter

For quick checks, you can also manually measure elapsed time using time.perf_counter().

? View Manual Timing Example

import time


def do_work():
    # Simple loop to simulate some CPU work
    total = 0
    for i in range(1_000_000):
        total += i
    return total


# Record the start time
start = time.perf_counter()
result = do_work()
# Record the end time
end = time.perf_counter()

# Print the result and elapsed time
print("Result:", result)
print(f"Elapsed: {end - start:.6f} seconds")

? Live Output and Explanation

? Interpreting cProfile Output

When you run the cProfile.run("main()", sort="tottime") example, you will see a table that includes columns like:

ncalls – number of calls to the function.
tottime – time spent in the function itself (excluding subcalls).
percall – average time per call (tottime / ncalls).
cumtime – cumulative time spent in the function and its subcalls.

The function with the highest tottime is usually your main bottleneck. In this example, has_duplicates_slow will dominate the runtime, showing that its O(n²) behavior is expensive for large lists.

After switching to has_duplicates_fast, rerun the profiler/timeit code: you should see a significant reduction in the measured time. This confirms that the optimization is real and not just a guess.

? Tips and Best Practices

✅ Measure first, optimize later. Do not optimize code based on intuition alone.
✅ Focus on algorithmic improvements (data structures, complexity) before micro-tweaks.
✅ Use cProfile for whole programs and timeit for small snippets.
✅ Optimize only the hot paths that the profiler highlights as slow.
✅ Keep code readable; document optimizations so future you understands why they exist.
⚠️ Avoid premature optimization; it can make code complex without real benefits.
⚠️ Always re-run tests after optimizing to ensure you did not change behavior.

? Practice Tasks

Write a function that computes the sum of all pairs in a list (nested loops). Profile it using cProfile. Then rewrite it using a more efficient approach and compare the results.
Take any algorithm you know (e.g., linear search vs binary search). Implement both, generate a large sorted list, and compare them using timeit.
Implement a function that concatenates strings using + inside a loop and another that uses "".join() on a list. Use timeit to see the difference.
Explore memory usage: try replacing a large list with a generator expression in some code and observe if performance or memory consumption changes.
Take a project script you already have, run python -m cProfile -o profile.out your_script.py and inspect where most time is spent.

? Use Cases

Optimizing data processing scripts (log analysis, ETL jobs, CSV processing).
Speeding up web request handlers in frameworks like Django or Flask.
Improving performance of competitive programming or coding challenge solutions.
Finding slow parts of a machine learning pipeline (data loading, preprocessing).
Reducing execution time of scheduled batch jobs and background tasks.