{"id":1436,"date":"2020-09-03T13:32:00","date_gmt":"2020-09-03T13:32:00","guid":{"rendered":"https:\/\/nag.com\/?post_type=insights&#038;p=1111"},"modified":"2023-07-04T16:45:40","modified_gmt":"2023-07-04T16:45:40","slug":"a-low-cost-introduction-to-machine-learning-training-on-microsoft-azure","status":"publish","type":"insights","link":"https:\/\/nag.com\/insights\/a-low-cost-introduction-to-machine-learning-training-on-microsoft-azure\/","title":{"rendered":"A Low-Cost Introduction to Machine Learning Training on Microsoft Azure"},"content":{"rendered":"<div class=\"container content-area-default \">\n    <div class=\"row justify-content--center\">\n        <div class=\"col-12 col-md-10 col-lg-8 col-xl-6\">\n            <p>Machine learning is becoming ever more powerful and prevalent in the modern world, and is being used in all kinds of places from cutting-edge science to computer games, and self-driving cars to food production. However, it is a computationally-intensive process &#8211; particularly for the initial training stage of the model, and almost universally requires expensive GPU hardware to complete the training in a reasonable length of time. Because of this high hardware cost, and the increasing availability of cloud computing many ML users, both new and experienced are migrating their workflows to the cloud in order to reduce costs and access the latest and most powerful hardware.<\/p>\n<p>This tutorial demonstrates porting an existing machine learning model to a virtual machine on the Microsoft Azure cloud platform. We will train a small movie recommendation model using a single GPU to give personalised recommendations. The total cost of performing this training should be no more than $5 using any of the single GPU instances currently available on Azure.<\/p>\n<p>This is not the only way to perform ML training on Azure, for example, Microsoft also offer the\u00a0<a href=\"https:\/\/azure.microsoft.com\/en-gb\/services\/machine-learning\/\">Azure ML product<\/a>, which is designed to allow rapid deployment of commonly used ML applications. However, the approach we will use here is the most flexible as it gives the user complete control over all aspects of the software environment, and is likely to be the fastest method of porting an existing ML workflow to Azure.<\/p>\n<h2>Requirements<\/h2>\n<p>To follow this tutorial you will need:<\/p>\n<ul>\n<li>A copy of the tutorial files from the git repository hosted on\u00a0<a href=\"https:\/\/github.com\/numericalalgorithmsgroup\/MLFirstSteps_Azure\">GitHub<\/a><\/li>\n<\/ul>\n<p>And either:<\/p>\n<ul>\n<li>A system with git, ssh, and the\u00a0<a href=\"https:\/\/docs.microsoft.com\/en-us\/cli\/azure\/install-azure-cli?view=azure-cli-latest\">Azure CLI<\/a>\u00a0installed and logged in to a valid account. If you are using Windows, the\u00a0<a href=\"https:\/\/docs.microsoft.com\/en-us\/windows\/wsl\/install-win10\">Windows Subsystem for Linux<\/a>\u00a0works well for this.<\/li>\n<\/ul>\n<p>or<\/p>\n<ul>\n<li>An active session of the\u00a0<a href=\"https:\/\/shell.azure.com\/\">Azure Cloud Shell<\/a><\/li>\n<\/ul>\n<h2>Choosing a suitable example<\/h2>\n<p>Although many machine learning models require large amounts of expensive compute time to train there are also models which can produce meaningful results from much smaller datasets using only a few minutes of CPU or GPU time. One such model is\u00a0<a href=\"https:\/\/arxiv.org\/abs\/1708.05031\">Neural Collaborative Filtering<\/a>\u00a0(NCF), which can be used to produce recommendation models from data on user interaction and rating data. This makes it possible to work through all the steps interactively in a few minutes and for only a few dollars in cloud costs.<\/p>\n<p>We will be training our\u00a0<a href=\"https:\/\/github.com\/numericalalgorithmsgroup\/MLFirstSteps_Azure\/tree\/master\/ncf\">NCF<\/a>\u00a0model using the\u00a0<a href=\"https:\/\/grouplens.org\/datasets\/movielens\/\">MovieLens-25M<\/a>\u00a0dataset from GroupLens. This dataset contains 25 million ratings of 62,000 movies from 162,000 users, along with tag genome data characterising the genre and features of the movies in the dataset. The resulting model can then be used to provide recommendations of the form &#8220;if you liked movie X you will probably also like movie Y&#8221;.<\/p>\n<p>The\u00a0<a href=\"https:\/\/github.com\/numericalalgorithmsgroup\/MLFirstSteps_Azure\/tree\/master\/ncf\">NCF<\/a>\u00a0implementation used for this tutorial is taken from the\u00a0<a href=\"https:\/\/github.com\/NVIDIA\/DeepLearningExamples\/tree\/master\/PyTorch\/Recommendation\/NCF\">NVidia Deep Learning Examples<\/a>\u00a0repository on GitHub with a small modification to update it to use the newest MovieLens-25M dataset.<\/p>\n<h2>GPU VM Quota on Azure<\/h2>\n<p>GPU-enabled VMs are publically available on Azure, however, you may still need to request quota before you can create them. If you do not have quota for the NCv2 family of VMs then the tutorial examples will fail to run with a message that you have exceeded quota.<\/p>\n<p>By default, this tutorial assumes you will run in the\u00a0<b>SouthCentralUS<\/b>\u00a0Azure region so you should request a quota of at least 6 vCPUs for the NCv2 family in this region. To do this, go to the\u00a0<a href=\"https:\/\/portal.azure.com\/\">Azure Portal<\/a>, go to the subscriptions area, open your subscription and select &#8220;Usage + Quotas&#8221; from the sidebar. From this pane, you can request additional quota using the &#8220;Request Increase&#8221; button. For a detailed guide see the\u00a0<a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/azure-portal\/supportability\/regional-quota-requests\">Microsoft Documentation page<\/a>.<\/p>\n<h2>Quickstart: Scripted VM Setup and Training<\/h2>\n<p>The entirety of the training process can be scripted using the Azure CLI and standard Linux tools. An example script for doing this is provided as\u00a0<code><a href=\"https:\/\/github.com\/numericalalgorithmsgroup\/MLFirstSteps_Azure\/blob\/master\/deploy_and_run_training.sh\">deploy_and_run_training.sh<\/a><\/code><\/p>\n<p>This script executes all the commands shown below to create the VM instance and run the training. The\u00a0<a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/virtual-machines\/extensions\/custom-script-linux\">custom-script<\/a>\u00a0VM extension is used to manage the installation of the docker and building of the image.<\/p>\n<p>To run the example, first, ensure that you are logged into the Azure CLI, then you will need to edit the\u00a0<code><a href=\"https:\/\/github.com\/numericalalgorithmsgroup\/MLFirstSteps_Azure\/blob\/master\/deploy_and_run_training.sh\">deploy_and_run_training.sh<\/a><\/code>\u00a0script to provide your personal ssh key &#8211; this is needed to allow the script to log into the training VM. You can then run the script which will set up the VM instance, run the training, download results and delete VM instance. When the script completes you should have the final trained weights, final predictions for one of the users, and a training log downloaded to files named model.pth, predictions.csv, and training.log, respectively, in your working directory.<\/p>\n<p><b>Note: The script will attempt to clean up all resources after use, but it is strongly recommended to check this manually in the Azure portal to avoid a nasty &#8211; and expensive &#8211; surprise if something goes wrong.<\/b><\/p>\n<h2>Setting Up a Training Instance<\/h2>\n<p>The\u00a0<a href=\"https:\/\/github.com\/numericalalgorithmsgroup\/MLFirstSteps_Azure\/tree\/master\/ncf\">NCF<\/a>\u00a0model with the MovieLens dataset is small enough to be trained in just a few minutes on a single GPU (P100 or V100), so to begin with we will set up a single VM instance and deploy a docker container with PyTorch that we can use to train the model. The instance type we used was &#8220;Standard_NC6s_v2&#8221;, which contains a single NVidia P100, however, you can use any instance type you like so long as it has an NVidia P100 or V100 &#8211; only the training time of the model should change.<\/p>\n<p><b>All of the setup commands below are contained in the\u00a0<a href=\"https:\/\/github.com\/numericalalgorithmsgroup\/MLFirstSteps_Azure\/blob\/master\/deploy_and_run_training.sh\">deploy_and_run_training.sh<\/a>\u00a0script &#8211; see the &#8220;Scripting the VM Setup and Training&#8221; section below<\/b><\/p>\n<p>First, we will create a new resource group to hold the VM and its related materials. This allows easy management and deletion of the resources we have used when they are no longer needed.<\/p>\n<p>Throughout this tutorial values inside angle brackets (&lt;&gt;) represent user-specific choices of names and options and should be replaced with an appropriate value when executing the command:<\/p>\n        <\/div>\n    <\/div>\n<\/div>\n\n<div class=\"container content-area-default \">\n    <div class=\"row justify-content--center\">\n        <div class=\"col-12 col-md-10 col-lg-10 col-xl-10\">\n            <pre class=\"prettyprint\"><code><span class=\"pln\">$ az group create <\/span><span class=\"pun\">--<\/span><span class=\"pln\">name <\/span><span class=\"str\">&lt;rg_name&gt;<\/span> <span class=\"pun\">--<\/span><span class=\"pln\">location <\/span><span class=\"typ\">SouthCentralUS<\/span><\/code><\/pre>\n        <\/div>\n    <\/div>\n<\/div>\n\n<div class=\"container content-area-default \">\n    <div class=\"row justify-content--center\">\n        <div class=\"col-12 col-md-10 col-lg-8 col-xl-6\">\n            <p><b>Note: you can use another location, but make sure it is one where NCv2 VMs are available.<\/b>\u00a0(Use https:\/\/azure.microsoft.com\/en-us\/global-infrastructure\/services\/ to check availability.)<\/p>\n<p>Then create a VM instance in this resource group:<\/p>\n        <\/div>\n    <\/div>\n<\/div>\n\n<div class=\"container content-area-default \">\n    <div class=\"row justify-content--center\">\n        <div class=\"col-12 col-md-10 col-lg-10 col-xl-10\">\n            <pre class=\"prettyprint lang-cpp prettyprinted\"><code><span class=\"pln\">$ az vm create \\\n  <\/span><span class=\"pun\">--<\/span><span class=\"pln\">resource<\/span><span class=\"pun\">-<\/span><span class=\"pln\">group <\/span><span class=\"str\">&lt;rg_name&gt;<\/span><span class=\"pln\"> \\\n  <\/span><span class=\"pun\">--<\/span><span class=\"pln\">name <\/span><span class=\"str\">&lt;vm_name&gt;<\/span><span class=\"pln\"> \\\n  <\/span><span class=\"pun\">--<\/span><span class=\"pln\">size <\/span><span class=\"typ\">Standard_NC6s_v2<\/span><span class=\"pln\"> \\\n  <\/span><span class=\"pun\">--<\/span><span class=\"pln\">image <\/span><span class=\"typ\">OpenLogic<\/span><span class=\"pun\">:<\/span><span class=\"typ\">CentOS<\/span><span class=\"pun\">-<\/span><span class=\"pln\">HPC<\/span><span class=\"pun\">:<\/span><span class=\"lit\">7_7<\/span><span class=\"pun\">-<\/span><span class=\"pln\">gen2<\/span><span class=\"pun\">:<\/span><span class=\"lit\">7.7<\/span><span class=\"pun\">.<\/span><span class=\"lit\">2020042001<\/span><span class=\"pln\"> \\\n  <\/span><span class=\"pun\">--<\/span><span class=\"pln\">ssh<\/span><span class=\"pun\">-<\/span><span class=\"pln\">key<\/span><span class=\"pun\">-<\/span><span class=\"pln\">value <\/span><span class=\"str\">&lt;sshkey&gt;<\/span><span class=\"pln\"> \\\n  <\/span><span class=\"pun\">--<\/span><span class=\"pln\">admin<\/span><span class=\"pun\">-<\/span><span class=\"pln\">username <\/span><span class=\"str\">&lt;admin_user&gt;<\/span><\/code><\/pre>\n        <\/div>\n    <\/div>\n<\/div>\n\n<div class=\"container content-area-default \">\n    <div class=\"row justify-content--center\">\n        <div class=\"col-12 col-md-10 col-lg-8 col-xl-6\">\n            <p>Then the GPU driver extension needs to be installed:<\/p>\n        <\/div>\n    <\/div>\n<\/div>\n\n<div class=\"container content-area-default \">\n    <div class=\"row justify-content--center\">\n        <div class=\"col-12 col-md-10 col-lg-10 col-xl-10\">\n            <pre class=\"prettyprint lang-cpp prettyprinted\"><code><span class=\"pln\">$ az vm extension <\/span><span class=\"typ\">set<\/span><span class=\"pln\"> \\\n  <\/span><span class=\"pun\">--<\/span><span class=\"pln\">resource<\/span><span class=\"pun\">-<\/span><span class=\"pln\">group <\/span><span class=\"str\">&lt;rg_name&gt;<\/span><span class=\"pln\"> \\\n  <\/span><span class=\"pun\">--<\/span><span class=\"pln\">vm<\/span><span class=\"pun\">-<\/span><span class=\"pln\">name <\/span><span class=\"str\">&lt;vm_name&gt;<\/span><span class=\"pln\"> \\\n  <\/span><span class=\"pun\">--<\/span><span class=\"pln\">name <\/span><span class=\"typ\">NvidiaGpuDriverLinux<\/span><span class=\"pln\"> \\\n  <\/span><span class=\"pun\">--<\/span><span class=\"pln\">publisher <\/span><span class=\"typ\">Microsoft<\/span><span class=\"pun\">.<\/span><span class=\"typ\">HpcCompute<\/span><\/code><\/pre>\n        <\/div>\n    <\/div>\n<\/div>\n\n<div class=\"container content-area-default \">\n    <div class=\"row justify-content--center\">\n        <div class=\"col-12 col-md-10 col-lg-8 col-xl-6\">\n            <p><b>Note: Currently this extension continues to perform actions after it reports being completed. You may need to wait up to an additional 10 minutes for the instance to install additional packages and reboot before the next steps can be done.<\/b><\/p>\n<p>After this completes connect to the instance using ssh. To find the public IP address for the instance use:<\/p>\n        <\/div>\n    <\/div>\n<\/div>\n\n<div class=\"container content-area-default \">\n    <div class=\"row justify-content--center\">\n        <div class=\"col-12 col-md-10 col-lg-10 col-xl-10\">\n            <pre class=\"prettyprint lang-cpp prettyprinted\"><code><span class=\"pln\">$ az vm <\/span><span class=\"typ\">list<\/span><span class=\"pun\">-<\/span><span class=\"pln\">ip<\/span><span class=\"pun\">-<\/span><span class=\"pln\">addresses <\/span><span class=\"pun\">--<\/span><span class=\"pln\">name <\/span><span class=\"str\">&lt;vm_name&gt;<\/span><\/code><\/pre>\n        <\/div>\n    <\/div>\n<\/div>\n\n<div class=\"container content-area-default \">\n    <div class=\"row justify-content--center\">\n        <div class=\"col-12 col-md-10 col-lg-8 col-xl-6\">\n            <h3>Getting a copy of the tutorial repository<\/h3>\n<p>Once we are logged into the VM instance, we need to acquire a local copy of the tutorial repository. We will do this on the local ssd of the instance, which is mounted at\u00a0<code>\/mnt\/resource:<\/code><\/p>\n        <\/div>\n    <\/div>\n<\/div>\n\n<div class=\"container content-area-default \">\n    <div class=\"row justify-content--center\">\n        <div class=\"col-12 col-md-10 col-lg-10 col-xl-10\">\n            <pre class=\"prettyprint lang-cpp prettyprinted\"><span class=\"pln\">$ sudo mkdir <\/span><span class=\"pun\">\/<\/span><span class=\"pln\">mnt<\/span><span class=\"pun\">\/<\/span><span class=\"pln\">resource<\/span><span class=\"pun\">\/<\/span><span class=\"pln\">work\n$ sudo chown <\/span><span class=\"pun\">-<\/span><span class=\"pln\">R $USER <\/span><span class=\"pun\">\/<\/span><span class=\"pln\">mnt<\/span><span class=\"pun\">\/<\/span><span class=\"pln\">resource<\/span><span class=\"pun\">\/<\/span><span class=\"pln\">work\n$ cd <\/span><span class=\"pun\">\/<\/span><span class=\"pln\">mnt<\/span><span class=\"pun\">\/<\/span><span class=\"pln\">resource<\/span><span class=\"pun\">\/<\/span><span class=\"pln\">work\n$ git clone https<\/span><span class=\"pun\">:<\/span><span class=\"com\">\/\/github.com\/numericalalgorithmsgroup\/MLFirstSteps_Azure<\/span><span class=\"pln\">\n$ cd <\/span><span class=\"typ\">MLFirstSteps_Azure<\/span><\/pre>\n        <\/div>\n    <\/div>\n<\/div>\n\n<div class=\"container content-area-default \">\n    <div class=\"row justify-content--center\">\n        <div class=\"col-12 col-md-10 col-lg-8 col-xl-6\">\n            <p>The MLFirstSteps_Azure directory contains all the materials needed to complete this tutorial. The\u00a0<a href=\"https:\/\/github.com\/numericalalgorithmsgroup\/MLFirstSteps_Azure\/tree\/master\/ncf\">ncf<\/a>\u00a0model and training scripts are located in the\u00a0<a href=\"https:\/\/github.com\/numericalalgorithmsgroup\/MLFirstSteps_Azure\/tree\/master\/ncf\">ncf<\/a>\u00a0subdirectory of this repository. This directory will be mounted in the docker container in the next step.<\/p>\n<h3>Installing the Docker and Building the Image<\/h3>\n<p>Once we have logged into the instance, we need to install the docker with the NVidia runtime. Since we are using the CentOS image on our VM, we can use yum as shown:<\/p>\n        <\/div>\n    <\/div>\n<\/div>\n\n<div class=\"container content-area-default \">\n    <div class=\"row justify-content--center\">\n        <div class=\"col-12 col-md-10 col-lg-10 col-xl-10\">\n            <pre class=\"prettyprint lang-cpp prettyprinted\"><code><span class=\"pln\">$ distribution<\/span><span class=\"pun\">=<\/span><span class=\"pln\">$<\/span><span class=\"pun\">(.<\/span> <span class=\"pun\">\/<\/span><span class=\"pln\">etc<\/span><span class=\"pun\">\/<\/span><span class=\"pln\">os<\/span><span class=\"pun\">-<\/span><span class=\"pln\">release<\/span><span class=\"pun\">;<\/span><span class=\"pln\">echo $ID$VERSION_ID<\/span><span class=\"pun\">)<\/span><span class=\"pln\">\n\n$ sudo yum install <\/span><span class=\"pun\">-<\/span><span class=\"pln\">y yum<\/span><span class=\"pun\">-<\/span><span class=\"pln\">utils\n\n$ sudo yum<\/span><span class=\"pun\">-<\/span><span class=\"pln\">config<\/span><span class=\"pun\">-<\/span><span class=\"pln\">manager <\/span><span class=\"pun\">--<\/span><span class=\"pln\">add<\/span><span class=\"pun\">-<\/span><span class=\"pln\">repo https<\/span><span class=\"pun\">:<\/span><span class=\"com\">\/\/download.docker.com\/linux\/centos\/docker-ce.repo<\/span><span class=\"pln\">\n$ sudo yum<\/span><span class=\"pun\">-<\/span><span class=\"pln\">config<\/span><span class=\"pun\">-<\/span><span class=\"pln\">manager <\/span><span class=\"pun\">--<\/span><span class=\"pln\">add<\/span><span class=\"pun\">-<\/span><span class=\"pln\">repo https<\/span><span class=\"pun\">:<\/span><span class=\"com\">\/\/nvidia.github.io\/nvidia-docker\/$distribution\/nvidia-docker.repo<\/span><span class=\"pln\">\n\n$ sudo yum install <\/span><span class=\"pun\">-<\/span><span class=\"pln\">y \\\n  docker<\/span><span class=\"pun\">-<\/span><span class=\"pln\">ce \\\n  docker<\/span><span class=\"pun\">-<\/span><span class=\"pln\">ce<\/span><span class=\"pun\">-<\/span><span class=\"pln\">cli \\\n  containerd<\/span><span class=\"pun\">.<\/span><span class=\"pln\">io \\\n  nvidia<\/span><span class=\"pun\">-<\/span><span class=\"pln\">container<\/span><span class=\"pun\">-<\/span><span class=\"pln\">toolkit \\\n  nvidia<\/span><span class=\"pun\">-<\/span><span class=\"pln\">container<\/span><span class=\"pun\">-<\/span><span class=\"pln\">runtime<\/span><\/code><\/pre>\n        <\/div>\n    <\/div>\n<\/div>\n\n<div class=\"container content-area-default \">\n    <div class=\"row justify-content--center\">\n        <div class=\"col-12 col-md-10 col-lg-8 col-xl-6\">\n            <p><b>Important Note: You may receive a message that yum is &#8220;waiting for a lock&#8221;. This can occur when azure extensions are still running in the background<\/b><\/p>\n<p>It is also necessary to instruct the docker to use a different directory to store container data as there is insufficient free space on Azure VM OS images:<\/p>\n        <\/div>\n    <\/div>\n<\/div>\n\n<div class=\"container content-area-default \">\n    <div class=\"row justify-content--center\">\n        <div class=\"col-12 col-md-10 col-lg-10 col-xl-10\">\n            <pre class=\"prettyprint lang-cpp prettyprinted\"><code><span class=\"pln\">$ sudo mkdir <\/span><span class=\"pun\">-<\/span><span class=\"pln\">p <\/span><span class=\"pun\">\/<\/span><span class=\"pln\">mnt<\/span><span class=\"pun\">\/<\/span><span class=\"pln\">resource<\/span><span class=\"pun\">\/<\/span><span class=\"pln\">docker\n$ sudo mkdir <\/span><span class=\"pun\">-<\/span><span class=\"pln\">p <\/span><span class=\"pun\">\/<\/span><span class=\"pln\">etc<\/span><span class=\"pun\">\/<\/span><span class=\"pln\">docker\n\n$ sudo tee <\/span><span class=\"pun\">\/<\/span><span class=\"pln\">etc<\/span><span class=\"pun\">\/<\/span><span class=\"pln\">docker<\/span><span class=\"pun\">\/<\/span><span class=\"pln\">daemon<\/span><span class=\"pun\">.<\/span><span class=\"pln\">json <\/span><span class=\"pun\">&lt;&lt;-<\/span><span class=\"pln\">EOF <\/span><span class=\"pun\">&gt;\/<\/span><span class=\"pln\">dev<\/span><span class=\"pun\">\/<\/span><span class=\"pln\">null\n<\/span><span class=\"pun\">{<\/span>\n    <span class=\"str\">\"data-root\"<\/span><span class=\"pun\">:<\/span> <span class=\"str\">\"\/mnt\/resource\/docker\"<\/span><span class=\"pun\">,<\/span>\n    <span class=\"str\">\"runtimes\"<\/span><span class=\"pun\">:<\/span> <span class=\"pun\">{<\/span>\n        <span class=\"str\">\"nvidia\"<\/span><span class=\"pun\">:<\/span> <span class=\"pun\">{<\/span>\n            <span class=\"str\">\"path\"<\/span><span class=\"pun\">:<\/span> <span class=\"str\">\"\/usr\/bin\/nvidia-container-runtime\"<\/span><span class=\"pun\">,<\/span>\n            <span class=\"str\">\"runtimeArgs\"<\/span><span class=\"pun\">:<\/span> <span class=\"pun\">[]<\/span>\n        <span class=\"pun\">}<\/span>\n    <span class=\"pun\">}<\/span>\n<span class=\"pun\">}<\/span><span class=\"pln\">\nEOF<\/span><\/code><\/pre>\n        <\/div>\n    <\/div>\n<\/div>\n\n<div class=\"container content-area-default \">\n    <div class=\"row justify-content--center\">\n        <div class=\"col-12 col-md-10 col-lg-8 col-xl-6\">\n            <p>Finally, add your username to the docker group and restart the docker service:<\/p>\n        <\/div>\n    <\/div>\n<\/div>\n\n<div class=\"container content-area-default \">\n    <div class=\"row justify-content--center\">\n        <div class=\"col-12 col-md-10 col-lg-10 col-xl-10\">\n            <pre class=\"prettyprint lang-cpp prettyprinted\"><code><span class=\"pln\">$ sudo gpasswd <\/span><span class=\"pun\">-<\/span><span class=\"pln\">a $USER docker\n\n$ sudo systemctl restart docker<\/span><\/code><\/pre>\n        <\/div>\n    <\/div>\n<\/div>\n\n<div class=\"container content-area-default \">\n    <div class=\"row justify-content--center\">\n        <div class=\"col-12 col-md-10 col-lg-8 col-xl-6\">\n            <p>Now we need to exit the ssh session and log back in for the user permission changes to take effect.<\/p>\n<p>Once we have logged back we can build the image using the Dockerfile provided in the repository:<\/p>\n        <\/div>\n    <\/div>\n<\/div>\n\n<div class=\"container content-area-default \">\n    <div class=\"row justify-content--center\">\n        <div class=\"col-12 col-md-10 col-lg-10 col-xl-10\">\n            <pre class=\"prettyprint lang-cpp prettyprinted\"><code><span class=\"pln\">$ cd <\/span><span class=\"pun\">\/<\/span><span class=\"pln\">mnt<\/span><span class=\"pun\">\/<\/span><span class=\"pln\">resource<\/span><span class=\"pun\">\/<\/span><span class=\"pln\">work<\/span><span class=\"pun\">\/<\/span><span class=\"typ\">MLFirstSteps_Azure<\/span><span class=\"pln\">\n$ docker build <\/span><span class=\"pun\">--<\/span><span class=\"pln\">rm <\/span><span class=\"pun\">-<\/span><span class=\"pln\">t pytorch_docker <\/span><span class=\"pun\">.<\/span> <span class=\"pun\">-<\/span><span class=\"pln\">f <\/span><span class=\"typ\">Dockerfile<\/span><\/code><\/pre>\n        <\/div>\n    <\/div>\n<\/div>\n\n<div class=\"container content-area-default \">\n    <div class=\"row justify-content--center\">\n        <div class=\"col-12 col-md-10 col-lg-8 col-xl-6\">\n            <h3>Running the Training<\/h3>\n<p>To run the training, first it is necessary to launch the docker container, mounting the training scripts directory as \/work within the container.<\/p>\n        <\/div>\n    <\/div>\n<\/div>\n\n<div class=\"container content-area-default \">\n    <div class=\"row justify-content--center\">\n        <div class=\"col-12 col-md-10 col-lg-10 col-xl-10\">\n            <pre class=\"prettyprint lang-cpp prettyprinted\"><code><span class=\"pln\">$ cd <\/span><span class=\"pun\">\/<\/span><span class=\"pln\">mnt<\/span><span class=\"pun\">\/<\/span><span class=\"pln\">resource<\/span><span class=\"pun\">\/<\/span><span class=\"pln\">work<\/span><span class=\"pun\">\/<\/span><span class=\"typ\">MLFirstSteps_Azure<\/span><span class=\"pln\">\n$ docker run <\/span><span class=\"pun\">--<\/span><span class=\"pln\">runtime<\/span><span class=\"pun\">=<\/span><span class=\"pln\">nvidia \\\n    <\/span><span class=\"pun\">-<\/span><span class=\"pln\">v <\/span><span class=\"pun\">\/<\/span><span class=\"pln\">mnt<\/span><span class=\"pun\">\/<\/span><span class=\"pln\">resource<\/span><span class=\"pun\">\/<\/span><span class=\"pln\">work<\/span><span class=\"pun\">\/<\/span><span class=\"typ\">MLFirstSteps_Azure<\/span><span class=\"pun\">\/:\/<\/span><span class=\"pln\">work \\\n    <\/span><span class=\"pun\">--<\/span><span class=\"pln\">rm \\\n    <\/span><span class=\"pun\">--<\/span><span class=\"pln\">name<\/span><span class=\"pun\">=<\/span><span class=\"str\">\"container_name\"<\/span><span class=\"pln\"> \\\n    <\/span><span class=\"pun\">--<\/span><span class=\"pln\">shm<\/span><span class=\"pun\">-<\/span><span class=\"pln\">size<\/span><span class=\"pun\">=<\/span><span class=\"lit\">10g<\/span><span class=\"pln\"> \\\n    <\/span><span class=\"pun\">--<\/span><span class=\"pln\">ulimit memlock<\/span><span class=\"pun\">=-<\/span><span class=\"lit\">1<\/span><span class=\"pln\"> \\\n    <\/span><span class=\"pun\">--<\/span><span class=\"pln\">ulimit <\/span><span class=\"typ\">stack<\/span><span class=\"pun\">=<\/span><span class=\"lit\">67108864<\/span><span class=\"pln\"> \\\n    <\/span><span class=\"pun\">--<\/span><span class=\"pln\">ipc<\/span><span class=\"pun\">=<\/span><span class=\"pln\">host \\\n    <\/span><span class=\"pun\">--<\/span><span class=\"pln\">network<\/span><span class=\"pun\">=<\/span><span class=\"pln\">host \\\n    <\/span><span class=\"pun\">-<\/span><span class=\"pln\">t \\\n    <\/span><span class=\"pun\">-<\/span><span class=\"pln\">i pytorch_docker \\\n    bash<\/span><\/code><\/pre>\n        <\/div>\n    <\/div>\n<\/div>\n\n<div class=\"container content-area-default \">\n    <div class=\"row justify-content--center\">\n        <div class=\"col-12 col-md-10 col-lg-8 col-xl-6\">\n            <p>The final step before running the training is to download and prepare the dataset. This is done using the\u00a0<a href=\"https:\/\/github.com\/numericalalgorithmsgroup\/MLFirstSteps_Azure\/blob\/master\/ncf\/prepare_dataset.sh\">prepare_dataset.sh<\/a>\u00a0script:<\/p>\n        <\/div>\n    <\/div>\n<\/div>\n\n<div class=\"container content-area-default \">\n    <div class=\"row justify-content--center\">\n        <div class=\"col-12 col-md-10 col-lg-8 col-xl-6\">\n            <pre class=\"prettyprint lang-cpp prettyprinted\"><code><span class=\"pln\">$ cd <\/span><span class=\"pun\">\/<\/span><span class=\"pln\">work<\/span><span class=\"pun\">\/<\/span><span class=\"pln\">ncf\n$ <\/span><span class=\"pun\">.\/<\/span><span class=\"pln\">prepare_dataset<\/span><span class=\"pun\">.<\/span><span class=\"pln\">sh<\/span><\/code><\/pre>\n        <\/div>\n    <\/div>\n<\/div>\n\n<div class=\"container content-area-default \">\n    <div class=\"row justify-content--center\">\n        <div class=\"col-12 col-md-10 col-lg-8 col-xl-6\">\n            <p>Finally, run the training. The DeepLearningExamples repository\u00a0<a href=\"https:\/\/github.com\/NVIDIA\/DeepLearningExamples\/tree\/master\/PyTorch\/Recommendation\/NCF\">readme<\/a>\u00a0gives details of the various options that can be passed to the training. For this example, we will run the training until accuracy of 0.979 is attained.<\/p>\n        <\/div>\n    <\/div>\n<\/div>\n\n<div class=\"container content-area-default \">\n    <div class=\"row justify-content--center\">\n        <div class=\"col-12 col-md-10 col-lg-10 col-xl-10\">\n            <pre class=\"prettyprint lang-cpp prettyprinted\"><code><span class=\"pln\">$ python <\/span><span class=\"pun\">-<\/span><span class=\"pln\">m torch<\/span><span class=\"pun\">.<\/span><span class=\"pln\">distributed<\/span><span class=\"pun\">.<\/span><span class=\"pln\">launch \\\n  <\/span><span class=\"pun\">--<\/span><span class=\"pln\">nproc_per_node<\/span><span class=\"pun\">=<\/span><span class=\"lit\">1<\/span><span class=\"pln\"> \\\n  <\/span><span class=\"pun\">--<\/span><span class=\"pln\">use_env ncf<\/span><span class=\"pun\">.<\/span><span class=\"pln\">py \\\n  <\/span><span class=\"pun\">--<\/span><span class=\"pln\">data <\/span><span class=\"pun\">\/<\/span><span class=\"pln\">data<\/span><span class=\"pun\">\/<\/span><span class=\"pln\">cache<\/span><span class=\"pun\">\/<\/span><span class=\"pln\">ml<\/span><span class=\"pun\">-<\/span><span class=\"lit\">25m<\/span><span class=\"pln\"> \\\n  <\/span><span class=\"pun\">--<\/span><span class=\"pln\">checkpoint_dir <\/span><span class=\"pun\">\/<\/span><span class=\"pln\">work \\\n  <\/span><span class=\"pun\">--<\/span><span class=\"pln\">threshold <\/span><span class=\"lit\">0.979<\/span><\/code><\/pre>\n        <\/div>\n    <\/div>\n<\/div>\n\n<div class=\"container content-area-default \">\n    <div class=\"row justify-content--center\">\n        <div class=\"col-12 col-md-10 col-lg-8 col-xl-6\">\n            <h3>Inferencing: Recommending Movies<\/h3>\n<p>Having trained the model we can now use it to recommend additional movies. For each user\/movie pairing the model gives a predicted user rating between 0-1. The highest predicted movies not rated by the user can then be used as recommendations for that user.<\/p>\n<p>The provided ncf\/userinference.py script gives an example of how to generate predictions from the trained model. It can be run either on the remote machine or a local machine with PyTorch installed and does not require a GPU to run. It takes two command-line arguments, the first the path to the trained model file, and the second the path to the original movies.csv file from the dataset &#8211; this is used to map movie IDs back to their names.<\/p>\n        <\/div>\n    <\/div>\n<\/div>\n\n<div class=\"container content-area-default \">\n    <div class=\"row justify-content--center\">\n        <div class=\"col-12 col-md-10 col-lg-10 col-xl-10\">\n            <pre class=\"prettyprint lang-cpp prettyprinted\"><span class=\"pln\">$ python userinference<\/span><span class=\"pun\">.<\/span><span class=\"pln\">py <\/span><span class=\"pun\">\/<\/span><span class=\"pln\">work<\/span><span class=\"pun\">\/<\/span><span class=\"pln\">model<\/span><span class=\"pun\">.<\/span><span class=\"pln\">pth <\/span><span class=\"pun\">\/<\/span><span class=\"pln\">data<\/span><span class=\"pun\">\/<\/span><span class=\"pln\">ml<\/span><span class=\"pun\">-<\/span><span class=\"lit\">25m<\/span><span class=\"pun\">\/<\/span><span class=\"pln\">movies<\/span><span class=\"pun\">.<\/span><span class=\"pln\">csv <\/span><span class=\"pun\">--<\/span><span class=\"pln\">output<\/span><span class=\"pun\">-<\/span><span class=\"pln\">dir <\/span><span class=\"pun\">\/<\/span><span class=\"pln\">work<\/span><\/pre>\n        <\/div>\n    <\/div>\n<\/div>\n\n<div class=\"container content-area-default \">\n    <div class=\"row justify-content--center\">\n        <div class=\"col-12 col-md-10 col-lg-8 col-xl-6\">\n            <p>The script will output the predictions sorted by descending rating in the file predictions.csv.<\/p>\n<p>By default, the script will generate a predicted rating for all movies in the dataset for the highest user ID number.<\/p>\n<h3>Downloading the results to your local machine<\/h3>\n<p>The results can now be copied back via ssh secure copy. To do this, use scp from your local machine:<\/p>\n        <\/div>\n    <\/div>\n<\/div>\n\n<div class=\"container content-area-default \">\n    <div class=\"row justify-content--center\">\n        <div class=\"col-12 col-md-10 col-lg-10 col-xl-10\">\n            <pre class=\"prettyprint lang-cpp prettyprinted\"><code><span class=\"pln\">$ scp <\/span><span class=\"str\">&lt;vm_ip&gt;<\/span><span class=\"pun\">:\/<\/span><span class=\"pln\">mnt<\/span><span class=\"pun\">\/<\/span><span class=\"pln\">resource<\/span><span class=\"pun\">\/<\/span><span class=\"pln\">work<\/span><span class=\"pun\">\/<\/span><span class=\"typ\">MLFirstSteps_Azure<\/span><span class=\"pun\">\/<\/span><span class=\"pln\">model<\/span><span class=\"pun\">.<\/span><span class=\"pln\">pth <\/span><span class=\"pun\">.<\/span><span class=\"pln\">\n$ scp <\/span><span class=\"str\">&lt;vm_ip&gt;<\/span><span class=\"pun\">:\/<\/span><span class=\"pln\">mnt<\/span><span class=\"pun\">\/<\/span><span class=\"pln\">resource<\/span><span class=\"pun\">\/<\/span><span class=\"pln\">work<\/span><span class=\"pun\">\/<\/span><span class=\"typ\">MLFirstSteps_Azure<\/span><span class=\"pun\">\/<\/span><span class=\"pln\">first_steps_example<\/span><span class=\"pun\">\/<\/span><span class=\"pln\">predictions<\/span><span class=\"pun\">.<\/span><span class=\"pln\">csv <\/span><span class=\"pun\">.<\/span><\/code><\/pre>\n        <\/div>\n    <\/div>\n<\/div>\n\n<div class=\"container content-area-default \">\n    <div class=\"row justify-content--center\">\n        <div class=\"col-12 col-md-10 col-lg-8 col-xl-6\">\n            <h3>Deleting the Instance After Use<\/h3>\n<p>To avoid being billed for more resources than needed it is important to delete the VM instance and associated resources after use.<\/p>\n<p>Ideally, if you created a resource group specifically for the tutorial resources, the whole group can be deleted at once:<\/p>\n        <\/div>\n    <\/div>\n<\/div>\n\n<div class=\"container content-area-default \">\n    <div class=\"row justify-content--center\">\n        <div class=\"col-12 col-md-10 col-lg-10 col-xl-10\">\n            <pre class=\"prettyprint lang-cpp prettyprinted\"><code><span class=\"pln\">$ az group <\/span><span class=\"kwd\">delete<\/span> <span class=\"pun\">--<\/span><span class=\"pln\">name <\/span><span class=\"str\">&lt;rg_name&gt;<\/span><\/code><\/pre>\n        <\/div>\n    <\/div>\n<\/div>\n\n<div class=\"container content-area-default \">\n    <div class=\"row justify-content--center\">\n        <div class=\"col-12 col-md-10 col-lg-8 col-xl-6\">\n            <p>Alternatively, if you wish to retain the other resources and delete just the VM instance, use the az vm delete command:<\/p>\n        <\/div>\n    <\/div>\n<\/div>\n\n<div class=\"container content-area-default \">\n    <div class=\"row justify-content--center\">\n        <div class=\"col-12 col-md-10 col-lg-10 col-xl-10\">\n            <pre class=\"prettyprint lang-cpp prettyprinted\"><code><span class=\"pln\">$ az vm <\/span><span class=\"kwd\">delete<\/span> <span class=\"pun\">--<\/span><span class=\"pln\">resource<\/span><span class=\"pun\">-<\/span><span class=\"pln\">group <\/span><span class=\"str\">&lt;rg_name&gt;<\/span> <span class=\"pun\">--<\/span><span class=\"pln\">name <\/span><span class=\"str\">&lt;vm_name&gt;<\/span><\/code><\/pre>\n        <\/div>\n    <\/div>\n<\/div>\n\n<div class=\"container content-area-default \">\n    <div class=\"row justify-content--center\">\n        <div class=\"col-12 col-md-10 col-lg-8 col-xl-6\">\n            <p>Either way, make sure that you&#8217;ve deleted everything you expected to by looking in the Azure Portal<\/p>\n        <\/div>\n    <\/div>\n<\/div>\n\n<div class=\"container content-area-default \">\n    <div class=\"row justify-content--center\">\n        <div class=\"col-12 col-md-10 col-lg-8 col-xl-6\">\n            <h3>Conclusions and Additional Resources<\/h3>\n<p>Having followed this tutorial you should have an idea of the steps involved in deploying an existing machine learning workflow on the Azure platform using Docker containers. The key steps in such a workflow are:<\/p>\n<ol>\n<li>Creating a suitable virtual machine instance and connecting to it via SSH<\/li>\n<li>Installing the machine learning framework &#8211; for example using Docker<\/li>\n<li>Preparing the model and data<\/li>\n<li>Running the training<\/li>\n<li>Downloading the results<\/li>\n<li>Cleaning up resources after use<\/li>\n<\/ol>\n<h3>What Haven&#8217;t We Told You (Yet)?<\/h3>\n<p>Running in the cloud gives you a lot of flexibility in terms of the machine types and pricing options available. Here are a couple of extra things to consider if you are planning to deploy ML to the cloud in earnest:<\/p>\n<h4>Spot Pricing<\/h4>\n<p>By default, Azure VMs are charged at &#8220;Pay as you go&#8221; pricing rates. This gives you guaranteed access to the VM until you choose to stop it, but it is the most expensive way to pay for compute on Azure.<\/p>\n<p>If you have a workload that can be interrupted\u00a0or are happy to take the risk that your job might not complete, you can use the\u00a0<a href=\"https:\/\/azure.microsoft.com\/en-gb\/pricing\/spot\/\">spot pricing tier<\/a>\u00a0instead. This allows you to purchase unused compute capacity at a large discount (typically 80-90%) with the risk that your workload could be evicted at any time.<\/p>\n<p>To make use of the spot pricing tier you can pass the &#8211;priority Spot option to the Azure cli when creating your VM.<\/p>\n        <\/div>\n    <\/div>\n<\/div>\n\n<div class=\"container content-area-default \">\n    <div class=\"row justify-content--center\">\n        <div class=\"col-12 col-md-10 col-lg-10 col-xl-10\">\n            <pre class=\"prettyprint lang-cpp prettyprinted\"><code><span class=\"pun\">```<\/span><span class=\"pln\">shell\n$ az vm create \\\n  <\/span><span class=\"pun\">--<\/span><span class=\"pln\">resource<\/span><span class=\"pun\">-<\/span><span class=\"pln\">group <\/span><span class=\"pln\"> \\\n  <\/span><span class=\"pun\">--<\/span><span class=\"pln\">name <\/span><span class=\"pln\"> \\\n  <\/span><span class=\"pun\">--<\/span><span class=\"pln\">priority <\/span><span class=\"typ\">Spot<\/span><span class=\"pln\"> \\\n  <\/span><span class=\"pun\">--<\/span><span class=\"pln\">size <\/span><span class=\"typ\">Standard_NC6s_v2<\/span><span class=\"pln\"> \\\n  <\/span><span class=\"pun\">--<\/span><span class=\"pln\">image <\/span><span class=\"typ\">OpenLogic<\/span><span class=\"pun\">:<\/span><span class=\"typ\">CentOS<\/span><span class=\"pun\">-<\/span><span class=\"pln\">HPC<\/span><span class=\"pun\">:<\/span><span class=\"lit\">7_7<\/span><span class=\"pun\">-<\/span><span class=\"pln\">gen2<\/span><span class=\"pun\">:<\/span><span class=\"lit\">7.7<\/span><span class=\"pun\">.<\/span><span class=\"lit\">2020042001<\/span><span class=\"pln\"> \\\n  <\/span><span class=\"pun\">--<\/span><span class=\"pln\">ssh<\/span><span class=\"pun\">-<\/span><span class=\"pln\">key<\/span><span class=\"pun\">-<\/span><span class=\"pln\">value <\/span><span class=\"pln\"> \\\n  <\/span><span class=\"pun\">--<\/span><span class=\"pln\">admin<\/span><span class=\"pun\">-<\/span><span class=\"pln\">username <\/span>\n<span class=\"pun\">```<\/span><\/code><\/pre>\n        <\/div>\n    <\/div>\n<\/div>\n\n<div class=\"container content-area-default \">\n    <div class=\"row justify-content--center\">\n        <div class=\"col-12 col-md-10 col-lg-8 col-xl-6\">\n            <h3>Different instance types<\/h3>\n<p>The Azure platform offers a variety of different GPU instances with different types and numbers of GPU available. In this tutorial we have used the &#8220;Standard_NC6s_v2&#8221; VM type with a single NVidia P100, however, we could also run this training on a V100 equipped VM or a VM with multiple GPUs. To do this we would simply change the requested VM size when calling\u00a0<code>az vm create<\/code>. For example, for a faster result at a higher cost than the Standard_NC6s_v2, we could use a Standard_NC12s_v2 instance which is equipped with 2 P100 GPUs. Then to take advantage of both GPUs when training pass a value of 2 to &#8211;nproc-per-node when launching the training.<\/p>\n<h3>Cloud HPC Migration Service<\/h3>\n<p><span class=\"nag-n-override\" style=\"margin-left: 0 !important;\"><i>n<\/i><\/span>AG offers a\u00a0Cloud HPC Migration Service\u00a0and HPC consulting to help organisations optimize their numerical applications for the cloud and HPC. For impartial, vendor-agnostic advice on HPC and to find out how <span class=\"nag-n-override\" style=\"margin-left: 0 !important;\"><i>n<\/i><\/span>AG can help you migrate to the cloud see\u00a0<a href=\"https:\/\/nag.com\/hpc-services\/\">HPC Services<\/a>.<\/p>\n        <\/div>\n    <\/div>\n<\/div>\n\n\n<div class=\"gbc-title-banner tac tac-lg tac-xl\" style='border-radius: 0px; '>\n    <div class=\"container\" style='border-radius: 0px; '>\n        <div class=\"row justify-content--center\" >\n            <div class=\"col-12\"  >\n                <div class=\"wrap pv-4 \" style=\"0pxbackground-color: \">\n                                <div class=\"col-12 col-md-12 col-lg-12 col-xl-12  banner-content\"  >\n    \n                    \n                    <div class=\"mt-1 mb-1 content\"><\/div>\n\n                    \n                    <a href='https:\/\/nag.com\/contact-us\/' style='background-color: #ff7d21ff; color: #ffffffff; border-radius: 30px; font-weight: 600; ' class='btn mr-1  ' target=\"_blank\">Connect me to an expert <i class='fas fa-angle-right'><\/i><\/a>                <\/div>\n                <\/div>\n            <\/div>\n        <\/div>\n    <\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Machine learning is becoming ever more powerful and prevalent in the modern world, and is being used in all kinds of places from cutting-edge science to computer games, and self-driving cars to food production. However, it is a computationally-intensive process &#8211; particularly for the initial training stage of the model, and almost universally requires expensive GPU hardware to complete the training in a reasonable length of time.<\/p>\n","protected":false},"author":9,"featured_media":1112,"parent":0,"menu_order":0,"template":"","meta":{"content-type":"","footnotes":""},"post-tag":[28,30],"class_list":["post-1436","insights","type-insights","status-publish","has-post-thumbnail","hentry"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v25.8 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>A Low-Cost Introduction to Machine Learning Training on Microsoft Azure - nAG<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/nag.com\/insights\/a-low-cost-introduction-to-machine-learning-training-on-microsoft-azure\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"A Low-Cost Introduction to Machine Learning Training on Microsoft Azure - nAG\" \/>\n<meta property=\"og:description\" content=\"Machine learning is becoming ever more powerful and prevalent in the modern world, and is being used in all kinds of places from cutting-edge science to computer games, and self-driving cars to food production. However, it is a computationally-intensive process - particularly for the initial training stage of the model, and almost universally requires expensive GPU hardware to complete the training in a reasonable length of time.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/nag.com\/insights\/a-low-cost-introduction-to-machine-learning-training-on-microsoft-azure\/\" \/>\n<meta property=\"og:site_name\" content=\"nAG\" \/>\n<meta property=\"article:modified_time\" content=\"2023-07-04T16:45:40+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/nag.com\/wp-content\/uploads\/2023\/05\/machine-learning-1024x531.jpeg\" \/>\n\t<meta property=\"og:image:width\" content=\"1024\" \/>\n\t<meta property=\"og:image:height\" content=\"531\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:site\" content=\"@NAGTalk\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/nag.com\/insights\/a-low-cost-introduction-to-machine-learning-training-on-microsoft-azure\/\",\"url\":\"https:\/\/nag.com\/insights\/a-low-cost-introduction-to-machine-learning-training-on-microsoft-azure\/\",\"name\":\"A Low-Cost Introduction to Machine Learning Training on Microsoft Azure - nAG\",\"isPartOf\":{\"@id\":\"https:\/\/nag.com\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/nag.com\/insights\/a-low-cost-introduction-to-machine-learning-training-on-microsoft-azure\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/nag.com\/insights\/a-low-cost-introduction-to-machine-learning-training-on-microsoft-azure\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/nag.com\/wp-content\/uploads\/2023\/05\/machine-learning.jpeg\",\"datePublished\":\"2020-09-03T13:32:00+00:00\",\"dateModified\":\"2023-07-04T16:45:40+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/nag.com\/insights\/a-low-cost-introduction-to-machine-learning-training-on-microsoft-azure\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/nag.com\/insights\/a-low-cost-introduction-to-machine-learning-training-on-microsoft-azure\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/nag.com\/insights\/a-low-cost-introduction-to-machine-learning-training-on-microsoft-azure\/#primaryimage\",\"url\":\"https:\/\/nag.com\/wp-content\/uploads\/2023\/05\/machine-learning.jpeg\",\"contentUrl\":\"https:\/\/nag.com\/wp-content\/uploads\/2023\/05\/machine-learning.jpeg\",\"width\":8082,\"height\":4191,\"caption\":\"Machine Deep learning algorithms, Artificial intelligence, AI, Automation and modern technology in business as concept.\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/nag.com\/insights\/a-low-cost-introduction-to-machine-learning-training-on-microsoft-azure\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/nag.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Insights\",\"item\":\"https:\/\/nag.com\/insights\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"A Low-Cost Introduction to Machine Learning Training on Microsoft Azure\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/nag.com\/#website\",\"url\":\"https:\/\/nag.com\/\",\"name\":\"NAG\",\"description\":\"Robust, trusted numerical software and computational expertise.\",\"publisher\":{\"@id\":\"https:\/\/nag.com\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/nag.com\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/nag.com\/#organization\",\"name\":\"Numerical Algorithms Group\",\"alternateName\":\"NAG\",\"url\":\"https:\/\/nag.com\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/nag.com\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/nag.com\/wp-content\/uploads\/2023\/11\/NAG-Logo.png\",\"contentUrl\":\"https:\/\/nag.com\/wp-content\/uploads\/2023\/11\/NAG-Logo.png\",\"width\":1244,\"height\":397,\"caption\":\"Numerical Algorithms Group\"},\"image\":{\"@id\":\"https:\/\/nag.com\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/x.com\/NAGTalk\",\"https:\/\/www.linkedin.com\/company\/nag\/\",\"https:\/\/www.youtube.com\/user\/NumericalAlgorithms\",\"https:\/\/github.com\/numericalalgorithmsgroup\"]}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"A Low-Cost Introduction to Machine Learning Training on Microsoft Azure - nAG","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/nag.com\/insights\/a-low-cost-introduction-to-machine-learning-training-on-microsoft-azure\/","og_locale":"en_US","og_type":"article","og_title":"A Low-Cost Introduction to Machine Learning Training on Microsoft Azure - nAG","og_description":"Machine learning is becoming ever more powerful and prevalent in the modern world, and is being used in all kinds of places from cutting-edge science to computer games, and self-driving cars to food production. However, it is a computationally-intensive process - particularly for the initial training stage of the model, and almost universally requires expensive GPU hardware to complete the training in a reasonable length of time.","og_url":"https:\/\/nag.com\/insights\/a-low-cost-introduction-to-machine-learning-training-on-microsoft-azure\/","og_site_name":"nAG","article_modified_time":"2023-07-04T16:45:40+00:00","og_image":[{"width":1024,"height":531,"url":"https:\/\/nag.com\/wp-content\/uploads\/2023\/05\/machine-learning-1024x531.jpeg","type":"image\/jpeg"}],"twitter_card":"summary_large_image","twitter_site":"@NAGTalk","schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/nag.com\/insights\/a-low-cost-introduction-to-machine-learning-training-on-microsoft-azure\/","url":"https:\/\/nag.com\/insights\/a-low-cost-introduction-to-machine-learning-training-on-microsoft-azure\/","name":"A Low-Cost Introduction to Machine Learning Training on Microsoft Azure - nAG","isPartOf":{"@id":"https:\/\/nag.com\/#website"},"primaryImageOfPage":{"@id":"https:\/\/nag.com\/insights\/a-low-cost-introduction-to-machine-learning-training-on-microsoft-azure\/#primaryimage"},"image":{"@id":"https:\/\/nag.com\/insights\/a-low-cost-introduction-to-machine-learning-training-on-microsoft-azure\/#primaryimage"},"thumbnailUrl":"https:\/\/nag.com\/wp-content\/uploads\/2023\/05\/machine-learning.jpeg","datePublished":"2020-09-03T13:32:00+00:00","dateModified":"2023-07-04T16:45:40+00:00","breadcrumb":{"@id":"https:\/\/nag.com\/insights\/a-low-cost-introduction-to-machine-learning-training-on-microsoft-azure\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/nag.com\/insights\/a-low-cost-introduction-to-machine-learning-training-on-microsoft-azure\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/nag.com\/insights\/a-low-cost-introduction-to-machine-learning-training-on-microsoft-azure\/#primaryimage","url":"https:\/\/nag.com\/wp-content\/uploads\/2023\/05\/machine-learning.jpeg","contentUrl":"https:\/\/nag.com\/wp-content\/uploads\/2023\/05\/machine-learning.jpeg","width":8082,"height":4191,"caption":"Machine Deep learning algorithms, Artificial intelligence, AI, Automation and modern technology in business as concept."},{"@type":"BreadcrumbList","@id":"https:\/\/nag.com\/insights\/a-low-cost-introduction-to-machine-learning-training-on-microsoft-azure\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/nag.com\/"},{"@type":"ListItem","position":2,"name":"Insights","item":"https:\/\/nag.com\/insights\/"},{"@type":"ListItem","position":3,"name":"A Low-Cost Introduction to Machine Learning Training on Microsoft Azure"}]},{"@type":"WebSite","@id":"https:\/\/nag.com\/#website","url":"https:\/\/nag.com\/","name":"NAG","description":"Robust, trusted numerical software and computational expertise.","publisher":{"@id":"https:\/\/nag.com\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/nag.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/nag.com\/#organization","name":"Numerical Algorithms Group","alternateName":"NAG","url":"https:\/\/nag.com\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/nag.com\/#\/schema\/logo\/image\/","url":"https:\/\/nag.com\/wp-content\/uploads\/2023\/11\/NAG-Logo.png","contentUrl":"https:\/\/nag.com\/wp-content\/uploads\/2023\/11\/NAG-Logo.png","width":1244,"height":397,"caption":"Numerical Algorithms Group"},"image":{"@id":"https:\/\/nag.com\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/x.com\/NAGTalk","https:\/\/www.linkedin.com\/company\/nag\/","https:\/\/www.youtube.com\/user\/NumericalAlgorithms","https:\/\/github.com\/numericalalgorithmsgroup"]}]}},"_links":{"self":[{"href":"https:\/\/nag.com\/wp-json\/wp\/v2\/insights\/1436","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/nag.com\/wp-json\/wp\/v2\/insights"}],"about":[{"href":"https:\/\/nag.com\/wp-json\/wp\/v2\/types\/insights"}],"author":[{"embeddable":true,"href":"https:\/\/nag.com\/wp-json\/wp\/v2\/users\/9"}],"version-history":[{"count":5,"href":"https:\/\/nag.com\/wp-json\/wp\/v2\/insights\/1436\/revisions"}],"predecessor-version":[{"id":3236,"href":"https:\/\/nag.com\/wp-json\/wp\/v2\/insights\/1436\/revisions\/3236"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/nag.com\/wp-json\/wp\/v2\/media\/1112"}],"wp:attachment":[{"href":"https:\/\/nag.com\/wp-json\/wp\/v2\/media?parent=1436"}],"wp:term":[{"taxonomy":"post-tag","embeddable":true,"href":"https:\/\/nag.com\/wp-json\/wp\/v2\/post-tag?post=1436"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}