Deploy PyTorch models directly to edge devices. Text, vision, and audio AI with privacy-preserving, real-time inference — no cloud required.
Data never leaves the device. Process personal content, conversations, and media locally without cloud exposure.
Instant inference with no network round-trips. Perfect for AR/VR experiences, multimodal AI interactions, and responsive conversational agents.
Zero network dependency for inference. Works seamlessly in low-bandwidth regions, remote areas, or completely offline.
No cloud compute bills. No API rate limits. Scale to billions of users without infrastructure costs growing linearly.
The convergence of efficient architectures and edge hardware creates new opportunities
The opportunity is now: Foundation models have crossed the efficiency threshold. Deploy sophisticated AI directly where data lives.
From battery-powered phones to energy-harvesting sensors, edge devices have strict power budgets. Microcontrollers may run on milliwatts, requiring extreme efficiency.
Sustained inference generates heat without active cooling. From smartphones to industrial IoT devices, thermal throttling limits continuous AI workloads.
Edge devices range from high-end phones to tiny microcontrollers. Beyond capacity, limited memory bandwidth creates bottlenecks when moving tensors between compute units.
From microcontrollers to smartphone NPUs to embedded GPUs. Each architecture demands unique optimizations, making broad deployment across diverse form factors extremely challenging.
But deploying PyTorch models to edge devices meant losing everything that made PyTorch great
PyTorch's intuitive APIs and eager execution power breakthrough research
Multiple intermediate formats, custom runtimes, C++ rewrites
PyTorch operations don't map 1:1 to other formats
Can't trace errors back to original PyTorch code
Locked into proprietary formats with limited operator support
Teams spend months rewriting Python models in C++ for production
Direct export from PyTorch to edge. Core ATen operators preserved. No intermediate formats, no vendor lock-in.
Optimize models offline for target device capabilities. Hardware-specific performance tuning before deployment.
Pick and choose optimization steps. Composable at both compile-time and runtime for maximum flexibility.
Fully open source with hardware partner contributions. Built on PyTorch's standardized IR and operator set.
Portable C++ runtime runs on microcontrollers to smartphones.
Native integration with PyTorch ecosystem, including torchao for quantization. Stay in familiar tools throughout.
Export, optimize, and run PyTorch models on edge devices
import torch
# Your existing PyTorch model
model = MyModel().eval()
example_inputs = (torch.randn(1, 3, 224, 224),)
# Export to create semantically equivalent graph
exported_program = torch.export.export(model, example_inputs)
Switch between backends with a single line change
from executorch.exir import to_edge_transform_and_lower
from executorch.backends.xnnpack.partition.xnnpack_partitioner import XnnpackPartitioner
program = to_edge_transform_and_lower(
exported_program,
partitioner=[XnnpackPartitioner()]
).to_executorch()
# Save to .pte file
with open("model.pte", "wb") as f:
f.write(program.buffer)
from executorch.exir import to_edge_transform_and_lower
from executorch.backends.apple.coreml.partition.coreml_partitioner import CoreMLPartitioner
program = to_edge_transform_and_lower(
exported_program,
partitioner=[CoreMLPartitioner()]
).to_executorch()
# Save to .pte file
with open("model.pte", "wb") as f:
f.write(program.buffer)
from executorch.exir import to_edge_transform_and_lower
from executorch.backends.qualcomm.partition.qnn_partitioner import QnnPartitioner
program = to_edge_transform_and_lower(
exported_program,
partitioner=[QnnPartitioner()]
).to_executorch()
# Save to .pte file
with open("model.pte", "wb") as f:
f.write(program.buffer)
#include <executorch/extension/module/module.h>
#include <executorch/extension/tensor/tensor.h>
Module module("model.pte");
auto tensor = make_tensor_ptr({2, 2}, {1.0f, 2.0f, 3.0f, 4.0f});
auto outputs = module.forward(tensor);
import ExecuTorch
let module = Module(filePath: "model.pte")
let input = Tensor<Float>([1.0, 2.0, 3.0, 4.0], shape: [2, 2])
let outputs = try module.forward(input)
val module = Module.load("model.pte")
val inputTensor = Tensor.fromBlob(floatArrayOf(1.0f, 2.0f, 3.0f, 4.0f), longArrayOf(2, 2))
val outputs = module.forward(EValue.from(inputTensor))
#import <ExecuTorch/ExecuTorch.h>
NSString *modelPath = [[NSBundle mainBundle] pathForResource:@"model" ofType:@"pte"];
ExecuTorchModule *module = [[ExecuTorchModule alloc] initWithFilePath:modelPath];
float data[] = {1.0f, 2.0f, 3.0f, 4.0f};
ExecuTorchTensor *input = [[ExecuTorchTensor alloc] initWithBytes:data
shape:@[@2, @2]
dataType:ExecuTorchDataTypeFloat];
NSArray<ExecuTorchValue *> *outputs = [module forwardWithTensor:input error:nil];
// Load model from file or buffer
const module = et.Module.load("model.pte");
// Create input tensor from array
const input = et.Tensor.fromArray([2, 2], [1.0, 2.0, 3.0, 4.0]);
// Run inference
const outputs = module.forward([input]);
Available on Android, iOS, Linux, Windows, macOS, and embedded microcontrollers (e.g., DSP and Cortex-M processors)
Need advanced features? ExecuTorch supports memory planning, quantization, profiling, and custom compiler passes.
Try the Full Tutorial →Run complex multimodal LLMs with simplified C++ interfaces
Choose your platform to see the multimodal API supporting text, images, and audio:
High-level APIs abstract away model complexity - just load, prompt, and get results
Explore LLM APIs →12+ hardware backends with acceleration contributed by industry partners via open source
CPU acceleration across Arm and x86 architectures
Neural Engine and Apple Silicon optimization
Hardware-accelerated AI inference on Qualcomm platforms
Microcontroller NPU for ultra-low power
Cross-platform graphics acceleration
x86 CPU and integrated GPU optimization
Dimensity chipset acceleration
Integrated NPU optimization
Automotive and IoT acceleration
GPU acceleration on macOS and iOS
Versatile graphics framework support
Digital signal processor optimization
Production deployments and strategic partnerships accelerating edge AI
Join thousands of developers using ExecuTorch in production
Get Started Today