How to Accelerate PyTorch Training on a MacBook: A Guide to Using Apple M Processors / Silicon 2024

For those new to machine learning on a MacBook or transitioning from a different setup, you’re probably curious about how to run machine learning tasks using Apple’s highly praised M2 or M3 processors. Naturally, you’ll want to leverage hardware acceleration features like CUDA rather than relying solely on the CPU. However, since most machine learning code is written with CUDA in mind, which relies on Nvidia GPUs, some modifications are required to fully utilize acceleration on Apple computers.

In this post, I’ll walk you through how to enable hardware acceleration on Apple computers equipped with the M2 processor. While this guide focuses on the Apple’s M2 chip, the same principles should apply to the M3 processor as well.

If you are interested in PyTorch Performance comparisons with a Windows device, here is my other post with the details.

PyTorch Performance : MacBook Pro vs Surface Book in 2024

pytorch macbook

Setting Up a Python Development Environment on MacBook

Installing VS Code

If you’re a Windows user who isn’t yet familiar with macOS or Linux environments, this process might seem a bit different. On a Mac, most apps are ready to go after simply copying the app file—no additional installation steps are needed.

First, download the VS Code app file from the official VS Code website. When you click on the macOS version, it will download a zip file.

After the download, double-click the zip file to extract it, and the app file will appear. While you can run it directly from there, I recommend following the standard routine by copying the app file into the Applications folder.

Additionally, to make development more convenient, it’s a good idea to install the Python Extension for VS Code, though I’ll skip that process in this post.

Installing Python

macOS comes with Python pre-installed. However, if you want to upgrade or use a different version, you can either download the installer from the official Python website or use Homebrew, a popular package manager, to install it. Personally, I recommend installing directly from the Python website, though there isn’t a significant difference (aside from the installation path).

On my system, running Ventura 13.5 OS, Python 3.9.6 is pre-installed, and I continue using this version. If you want to install another version, you can do so as shown below.

(Optional) Setting Up a Python Virtual Environment

While not mandatory, I highly recommend setting up a virtual environment (venv) when working on multiple projects. This helps avoid compatibility issues related to version differences. You can install it using the following command, and replace the name according to your environment and preference:

cd {project folder}
python3 -m venv {"Environment name you wish"}

Once the virtual environment is set up, you can start developing in your newly created Python environment by selecting the correct interpreter in VS Code.

Installing PyTorch

On macOS, unlike Windows or Linux systems with CUDA, you don’t have to worry about selecting the correct CUDA version for PyTorch since macOS doesn’t support CUDA. Installing PyTorch is straightforward. Open your terminal and run the following command to install it: [reference]

pip3 install --pre torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/nightly/cpu

Verifying M2 Processor Recognition

To check if the installed environment recognizes and can use the Apple M2 processor, follow these steps. First, create a new folder in your desired location and select the appropriate Python interpreter (choose the virtual environment if you created one).

#Test code from https://developer.apple.com/metal/pytorch/

import torch
if torch.backends.mps.is_available():
    mps_device = torch.device("mps")
    x = torch.ones(1, device=mps_device)
    print (x)
else:
    print ("MPS device not found.")

If the installation and execution proceed without any issues, you should see a message confirming that the Apple processor has been recognized.

tensor([1.], device='mps:0')

Running a Training Test

Now that we’ve confirmed the recognition of the Apple processor, let’s test whether it can be used effectively for training machine learning models. We’ll run a simple MNIST classification example from Python’s official code to verify if hardware acceleration is utilized during the training process.

First, you can obtain the example code from [here]. While many examples require Numpy, this specific example can run without it.

Next, create a new Python file in your project and paste the code from the link provided. This code is already set up to automatically detect whether CUDA (for Nvidia GPUs) or MPS (Apple’s Metal Performance Shaders for M1/M2 chips) is available, and it will select the appropriate device accordingly.

There’s no need to adjust any parameters. Simply run the code, and it will determine if the system is using hardware acceleration. You’ll be able to see whether the training process leverages the MPS support on Apple Silicon processors. If MPS is used, it means that the M2 processor’s acceleration is functioning as expected.

Python MNIST Example File Link

To give you a quick look at the key code that enables hardware acceleration, here’s a snippet demonstrating how the device is selected. As you can see in the example, early in the program, there’s a variable that determines which device (CPU, CUDA, or MPS) to use for computations:

    if use_cuda:
        device = torch.device("cuda")
    elif use_mps:
        device = torch.device("mps")
    else:
        device = torch.device("cpu")

In this example, all device options are listed to ensure compatibility across different systems, but if you’re working in a single environment, there’s no need to include all the options. On a MacBook with an M1 or M2 processor, you can simplify the code to just this single line:

device = torch.device("mps")

Once you’ve selected the device, you can ensure the network is prepared to use it by moving the model and data to the chosen device. This step ensures that the computations during training will be performed on the specified device. Here’s how you can do it:

model = Net().to(device) # Net().to("mps") also works

Returning to the example code, once you execute the script, you’ll see the training process begin. The output will confirm that the model is using the device you specified, and you should see messages similar to this as training starts:

Train Epoch: 1 [0/60000 (0%)]   Loss: 2.300024
Train Epoch: 1 [640/60000 (1%)] Loss: 1.141236
Train Epoch: 1 [1280/60000 (2%)]        Loss: 0.711558
Train Epoch: 1 [1920/60000 (3%)]        Loss: 0.534700
Train Epoch: 1 [2560/60000 (4%)]        Loss: 0.394732
Train Epoch: 1 [3200/60000 (5%)]        Loss: 0.266304
Train Epoch: 1 [3840/60000 (6%)]        Loss: 0.240142
Train Epoch: 1 [4480/60000 (7%)]        Loss: 0.223248
Train Epoch: 1 [5120/60000 (9%)]        Loss: 0.607467
Train Epoch: 1 [5760/60000 (10%)]       Loss: 0.203784
Train Epoch: 1 [6400/60000 (11%)]       Loss: 0.404807
Train Epoch: 1 [7040/60000 (12%)]       Loss: 0.216695
Train Epoch: 1 [7680/60000 (13%)]       Loss: 0.167440
Train Epoch: 1 [8320/60000 (14%)]       Loss: 0.176644
Train Epoch: 1 [8960/60000 (15%)]       Loss: 0.251940
Train Epoch: 1 [9600/60000 (16%)]       Loss: 0.178195
Train Epoch: 1 [10240/60000 (17%)]      Loss: 0.453998

To easily verify whether the training is utilizing hardware acceleration, there are two straightforward methods:

Compare Training Time by Setting it to Use Only the CPU
Monitor GPU Usage During Training

By using either of these methods, you can easily verify whether your model is leveraging the acceleration capabilities of the M2 processor.

python3 python_conv_ex.py --no-mps

Even without measuring the time explicitly, you can easily notice that training becomes significantly slower when using the CPU compared to using MPS, giving you a clear sense of the acceleration provided by MPS.

Monitoring GPU Usage with Activity Monitor

The second method to verify hardware acceleration is by monitoring GPU usage via macOS’s Activity Monitor. Follow these steps to open the Activity Monitor:

Search and Open Activity Monitor:
- Open Launchpad, type ‘Activity Monitor,’ and launch it.
- Alternatively, you can go to the Applications folder, then the Utility folder, and find Activity Monitor there.
Enable GPU Monitoring: By default, GPU usage is not shown in Activity Monitor. To display GPU processes, follow these steps:
- In the Activity Monitor, go to the top menu and click [View].
- Select [GPU Processes] from the dropdown.

This will allow you to track the GPU usage in real time. When your machine learning model is running with MPS, you should observe an increase in GPU activity in this window. If no GPU processes are shown, it indicates that the model is running on the CPU instead.

Now, you should see the [% GPU] column added in the Activity Monitor. By default, it appears towards the end of the columns, but you can click and drag it to reposition it wherever you prefer for easier monitoring.

This new column will show the percentage of GPU usage, allowing you to see in real-time how much of your GPU is being utilized during model training. If you’re using MPS, you should notice this number increase while your machine learning tasks are running, confirming that hardware acceleration is active.

Now, return to VS Code and run the code you executed earlier. This time, as the training process runs, monitor the [% GPU] column in the Activity Monitor to check the GPU usage.

If the MPS backend is properly configured and being utilized, you should see the GPU usage percentage rise during the training process. This confirms that the Apple M1 or M2 processor is actively accelerating the machine learning tasks, making the training faster compared to CPU-only execution.

By observing this in real-time, you can ensure that the GPU resources are being effectively used for your project.

As observed, during the training process, you should see that the Python process maintains a high level of GPU usage, confirming that the MPS backend is being utilized for acceleration. Additionally, when running the same training task with only the CPU, you will notice that GPU usage remains at zero or very low, which further demonstrates that the system is not leveraging the GPU.

This comparison effectively illustrates the performance benefits of using hardware acceleration on Apple’s M1/M2 processors versus CPU-only computation.

Conclusion

In this post, we explored how to leverage PyTorch’s training acceleration on Apple’s processors. With the existing examples, the implementation is straightforward, and once the environment is properly set up, following along shouldn’t be difficult. You can easily apply the same methods to other code by using the approach demonstrated in the example.

While Apple’s processors may not match the speed of high-end Nvidia GPUs when it comes to training tasks, and the MacBook itself can be relatively expensive, the advantage lies in its portability. For machine learning development that doesn’t involve massive datasets or heavy computational loads, using a lightweight, portable notebook like the MacBook becomes an attractive option.