MLX · Core ML · On-Device ML

MLX vs Core ML: Choosing an On-Device Inference Stack

April 8, 2026·10 min read·By Samith Wijesinghe

MLXCore MLiOSEdge AI

Apple ships two ways to run machine learning on-device, and they're not competitors so much as different tools. Core ML is the mature, OS-integrated path; MLX is the flexible, research-friendly newcomer built for Apple silicon. Here's how I decide between them on real iOS projects.

The one-line summary

Use Core ML for fixed, ahead-of-time-compiled models where you want maximum Neural Engine efficiency and tight OS integration. Reach for MLX when you need dynamic shapes, custom sampling, KV-cache control, runtime quantization, or you're shipping a generative LLM.

Core ML: the OS-native path

Core ML has been Apple's on-device ML framework since 2017. You convert a model to a .mlpackage ahead of time, and the OS schedules it across CPU, GPU and the Neural Engine (ANE) automatically. Strengths:

Neural Engine access. Core ML is the primary way to actually use the ANE, which is dramatically more power-efficient than the GPU for the operations it supports.
OS integration. Vision, Natural Language, Sound Analysis and Create ML all sit on top of it.
Battery & thermals. For vision and audio models, ANE execution sips power.

The cost is rigidity: the compute graph is largely fixed at conversion time, dynamic shapes are awkward, and iterating on a generative model's sampling loop is painful.

MLX: built for Apple silicon

MLX is Apple's open-source array framework with a NumPy-like API. It was designed around unified memory and treats Apple silicon as a first-class target. Strengths:

Dynamic by default. Define and change graphs at runtime — ideal for token-by-token LLM generation.
Built-in quantization. 4-bit and 8-bit quantization are first-class, which is exactly what you need to fit a model on a phone.
Lazy evaluation + unified memory. No host/device copies; arrays live where the compute is.
A real LLM ecosystem. mlx-lm and mlx-swift make converting and running language models genuinely pleasant.

The trade-off: MLX targets the GPU (not the ANE today), so for some vision/audio workloads Core ML is more power-efficient.

Side by side

Model type

Vision / audio / classifiers → Core ML. Generative LLMs and anything with dynamic control flow → MLX.

Hardware target

Need the Neural Engine for efficiency → Core ML. GPU-bound LLM throughput with unified memory → MLX.

Iteration speed

Stable model you compile once → Core ML. Swapping models, tuning sampling, quantizing on the fly → MLX.

Distribution

Core ML's .mlpackage is the cleanest to bundle; MLX models are weight directories you load at runtime, which is more flexible but slightly more plumbing.

A decision checklist

Generative LLM on-device? → MLX. (See running LLMs on iPhone with MLX.)
Vision/audio model that maps cleanly to the ANE? → Core ML.
Need custom sampling, KV-cache control, or runtime quantization? → MLX.
Battery-critical, always-on inference? → Core ML on the Neural Engine.
Prototyping and iterating fast? → MLX, then consider exporting to Core ML if you need the ANE.

Both are pillars of edge AI on iOS, and in a mature app you'll often use both: Core ML for the perception layer, MLX for the generative layer. For the broader argument on staying on-device at all, see Edge AI on iOS: why on-device beats the cloud.

◆ Written by Samith Wijesinghe — iOS engineer & ML researcher specializing in on-device ML, MLX and edge AI.