MMehmet Ünlü
HomeAboutProjectsNotesResumeContact
TR

Mehmet Ünlü

I study Electronics and Communication Engineering at ITU and build hands-on projects around forecasting, computer vision, and making data workflows faster.

HomeAboutProjectsNotesResumeContact
OptimizationNumPyPandasVectorizationParallelismML Systems

ML Pipeline Runtime Optimization

Refactored a feature-heavy ML pipeline around vectorized operations, matrix-based computation, and memory-aware transformations.

Runtime

0m -> 4m

End-to-end pipeline

Rows

0M

Feature computation scale

Data

~1 GB

Tabular workload

Speedup

0x

Iteration velocity

Problem

Pipeline runtime and memory pressure made iteration slow, expensive, and difficult to scale for production-like data sizes.

Challenge

The pipeline mixed repeated dataframe operations, Python loops, redundant intermediate objects, and compute paths that did not match the shape of the data.

Architecture

How the pieces fit together.

The optimized workflow batches feature transforms, replaces row-wise loops with NumPy/Pandas vectorization, parallelizes independent computation, and controls temporary allocations.

Architecture View

System structure and decision flow

Before40m
After4m

Python loops

Interpreter overhead

Copies

Memory pressure

Repeated joins

Redundant work

Unbatched features

Slow inference

Dataset / Inputs

  • Approximately 1 GB of tabular data and 3 million rows flowing through feature engineering and inference steps.

Technical Decisions

  • Profile first, then optimize the hottest compute paths.
  • Replace row-level Python operations with vectorized NumPy/Pandas transformations.
  • Batch repeated feature calculations and remove redundant intermediate frames.
  • Use parallelism only for independent workloads with clear memory boundaries.

Implementation Details

  • Converted repeated loops into matrix-oriented operations.
  • Reduced dataframe copies and temporary object growth.
  • Moved computation toward C-backed array operations.
  • Kept output parity checks around key feature columns.

Metrics / Results

  • Runtime was reduced from 40 minutes to 4 minutes on roughly 1 GB and 3 million rows while keeping the modeling behavior intact.

Lessons Learned

  • Runtime optimization is usually data-layout work before it is model work.
  • Memory bloat can hide inside convenient dataframe chains.
  • Small Python operations become expensive when repeated millions of times.

Future Improvements

  • Add automated profiling snapshots to CI.
  • Explore Polars or DuckDB for larger-than-memory feature workloads.
  • Track memory high-water marks in production runs.