Modular: Accelerating the Pace of AI

PRODUCT

An integrated AI
developer experience

The Modular Accelerated Xecution (MAX) platform is a unified set of tools and libraries that provides everything you need to deploy low-latency, high-throughput, real-time AI inference pipelines into production.

Max
components

Mojo
A programming language that combines the usability of Python with the performance of C, unlocking unparalleled programmability of AI hardware and extensibility of AI models for all AI engineers.
Learn about Mojo
- Mojo Docs
- Mojo Community
MAX Engine
A model inference runtime and API library that executes all your AI pipelines on any hardware with unparalleled performance and cost savings.
Learn about MAX Engine
- MAX Engine Docs
- MAX Engine Github Repo
MAX Serving
A model serving library for the MAX Engine that provides full interoperability with existing serving systems (e.g. Triton) and that seamlessly deploys within existing container infrastructure (e.g., Kubernetes).
Learn about MAX Serving
- MAX Serving Docs
- Get Started

USE CASES

Incredibly easy to get started

from max import engine

# Load your model
session = engine.InferenceSession()
model = session.load(MODEL_PATH)

# Prepare the inputs, then run an inference
outputs = model.execute(**inputs)

from max.graph import Dim, Module, MOTensor

@value
struct LLM:
    var params: ModelParams
    fn build(inout self, inout m: Module):
        var g = m.graph("llm",
									TypeTuple(MOTensor(
                    DType.float32,
                    Dim.dynamic(),
										Dim.dynamic())
									)
								)
        ...
        g.output((reshape(
										next_token, self.batch
									)))

from max.engine import InferenceSession

var sess = InferenceSession()
var txt_enc = sess.load_model('txt-encoder')
var img_dec = sess.load_model('img-decoder')
var img_dif = sess.load_model('img-diffuser')
var latent = ...
for step in range(n_steps):
    var prev = latent
    var latent = execute(img_dif, latent)
    var pred = ...
    latent = ...

var decoded = execute(img_dec, latent)
var pixels = decoded.to_numpy()
var img = Image.fromarray(pixels, 'RGB')

Quick performance wins

Use our Python or C API to replace your current TensorFlow, PyTorch, or ONNX inference calls with MAX Engine. With 3 lines of code you can execute your AI models up to 5x faster across a variety of CPU architectures (Intel, AMD, ARM). Additionally, use MAX Serving as a drop-in replacement for your NVIDIA Triton Inference Server.

Extend & optimize your models

Once you're using MAX Engine, you can optimize your performance further by using Mojo to write custom ops or build your whole model in Mojo, using the MAX Graph API (for inference).

Full stack on MAX

Beyond inference performance in MAX Engine, you can further optimize the rest of your AI pipeline by migrating your data pre/post-processing code and application code to Mojo. Over time, we will add more tools and libraries to MAX that accelerate development for other parts of your AI stack.

Get started

from max import engine

# Load your model
session = engine.InferenceSession()
model = session.load(MODEL_PATH)

# Prepare the inputs, then run an inference
outputs = model.execute(**inputs)

Quick performance wins

from max.graph import Dim, Module, MOTensor

@value
struct LLM:
    var params: ModelParams
    fn build(inout self, inout m: Module):
        var g = m.graph("llm",
									TypeTuple(MOTensor(
                    DType.float32,
                    Dim.dynamic(),
										Dim.dynamic())
									)
								)
        ...
        g.output((reshape(
										next_token, self.batch
									)))

Extend & optimize your models

Once you're using MAX Engine, you can optimize your performance further by using Mojo to write custom ops or build your whole model in Mojo, using the MAX Graph API (for inference).

from max.engine import InferenceSession

var sess = InferenceSession()
var txt_enc = sess.load_model('txt-encoder')
var img_dec = sess.load_model('img-decoder')
var img_dif = sess.load_model('img-diffuser')
var latent = ...
for step in range(n_steps):
    var prev = latent
    var latent = execute(img_dif, latent)
    var pred = ...
    latent = ...

var decoded = execute(img_dec, latent)
var pixels = decoded.to_numpy()
var img = Image.fromarray(pixels, 'RGB')

Full stack on MAX

Try it

Latest about Modular

Developer

Row-major vs. column-major matrices: a performance analysis in Mojo and NumPy

April 10, 2024

Developer

How to Contribute to Mojo Standard Library: A Step-by-Step Guide

April 8, 2024

Developer

What’s new in Mojo 24.2: Mojo Nightly, Enhanced Python Interop, OSS stdlib and more

April 2, 2024

Developer

Why Modular?

01 BUILT BY THE WORLD’S AI EXPERTS

Our team has built most of the world’s existing AI infrastructure, including TensorFlow, PyTorch, ONNX, and XLA, and we’ve built and scaled dev tools like Swift, LLVM, and MLIR. Now we’re focused on rebuilding AI infrastructure for the world.

02 Reinvented from the ground up

To unlock the next wave of AI innovation, we started with a “first principles” approach to building the lowest layers of the AI stack. We can’t pile on more and more layers of complexity on top of already over-complicated existing solutions.

03 Infrastructure that just works

We build technology that meets you where you are. We don’t require you to rewrite your models, workflows, or application code, grapple with confusing converters, or be a hardware expert to take advantage of bleeding-edge technology.

Max
Platform accelerates AI.

It's Programmable

Programmable, performant & portable

Full programmability

Unparalleled performance

Seamless portability

Unparalleled latency & cost savings

An integrated AI
developer experience

Max
components

Mojo

MAX Engine

MAX Serving

Incredibly easy to get started

Quick performance wins

Extend & optimize your models

Full stack on MAX

Quick performance wins

Extend & optimize your models

Full stack on MAX

Latest about Modular

Row-major vs. column-major matrices: a performance analysis in Mojo and NumPy

How to Contribute to Mojo Standard Library: A Step-by-Step Guide

What’s new in Mojo 24.2: Mojo Nightly, Enhanced Python Interop, OSS stdlib and more

The Next Big Step in Mojo🔥 Open Source

Leveraging MAX Engine's Dynamic Shape Capabilities

MAX 24.2 is Here! What’s New?

Why Modular?

01

BUILT BY THE WORLD’S AI EXPERTS

02

Reinvented from the ground up

03

Infrastructure that just works

Try
Max
right now

Max Platform accelerates AI.

It's Programmable

Programmable, performant & portable

Full programmability

Unparalleled performance

Seamless portability

Unparalleled latency & cost savings

An integrated AI developer experience

Max components

Mojo

MAX Engine

MAX Serving

Incredibly easy to get started

Quick performance wins

Extend & optimize your models

Full stack on MAX

Quick performance wins

Extend & optimize your models

Full stack on MAX

Latest about Modular

Row-major vs. column-major matrices: a performance analysis in Mojo and NumPy

How to Contribute to Mojo Standard Library: A Step-by-Step Guide

What’s new in Mojo 24.2: Mojo Nightly, Enhanced Python Interop, OSS stdlib and more

The Next Big Step in Mojo🔥 Open Source

Leveraging MAX Engine's Dynamic Shape Capabilities

MAX 24.2 is Here! What’s New?

Why Modular?

01

BUILT BY THE WORLD’S AI EXPERTS

02

Reinvented from the ground up

03

Infrastructure that just works

Try Max right now

Max
Platform accelerates AI.

An integrated AI
developer experience

Max
components

Try
Max
right now