MLX · Core ML · On-Device ML

MLX vs Core ML: Choosing an On-Device Inference Stack

April 8, 2026·10 min read·By Samith Wijesinghe
MLXCore MLiOSEdge AI

Apple ships two ways to run machine learning on-device, and they're not competitors so much as different tools. Core ML is the mature, OS-integrated path; MLX is the flexible, research-friendly newcomer built for Apple silicon. Here's how I decide between them on real iOS projects.

The one-line summary

Use Core ML for fixed, ahead-of-time-compiled models where you want maximum Neural Engine efficiency and tight OS integration. Reach for MLX when you need dynamic shapes, custom sampling, KV-cache control, runtime quantization, or you're shipping a generative LLM.

Core ML: the OS-native path

Core ML has been Apple's on-device ML framework since 2017. You convert a model to a .mlpackage ahead of time, and the OS schedules it across CPU, GPU and the Neural Engine (ANE) automatically. Strengths:

The cost is rigidity: the compute graph is largely fixed at conversion time, dynamic shapes are awkward, and iterating on a generative model's sampling loop is painful.

MLX: built for Apple silicon

MLX is Apple's open-source array framework with a NumPy-like API. It was designed around unified memory and treats Apple silicon as a first-class target. Strengths:

The trade-off: MLX targets the GPU (not the ANE today), so for some vision/audio workloads Core ML is more power-efficient.

Side by side

Model type

Vision / audio / classifiers → Core ML. Generative LLMs and anything with dynamic control flow → MLX.

Hardware target

Need the Neural Engine for efficiency → Core ML. GPU-bound LLM throughput with unified memory → MLX.

Iteration speed

Stable model you compile once → Core ML. Swapping models, tuning sampling, quantizing on the fly → MLX.

Distribution

Core ML's .mlpackage is the cleanest to bundle; MLX models are weight directories you load at runtime, which is more flexible but slightly more plumbing.

A decision checklist

  1. Generative LLM on-device? → MLX. (See running LLMs on iPhone with MLX.)
  2. Vision/audio model that maps cleanly to the ANE? → Core ML.
  3. Need custom sampling, KV-cache control, or runtime quantization? → MLX.
  4. Battery-critical, always-on inference? → Core ML on the Neural Engine.
  5. Prototyping and iterating fast? → MLX, then consider exporting to Core ML if you need the ANE.

Both are pillars of edge AI on iOS, and in a mature app you'll often use both: Core ML for the perception layer, MLX for the generative layer. For the broader argument on staying on-device at all, see Edge AI on iOS: why on-device beats the cloud.

Written by Samith Wijesinghe — iOS engineer & ML researcher specializing in on-device ML, MLX and edge AI.

Keep reading

Running LLMs on iPhone with Apple's MLX framework
The practical guide to on-device LLM inference.
Edge AI on iOS: why on-device beats the cloud
Privacy, latency, cost and offline reliability.