Create and deploy your Machine Learning Model with Amazon SageMaker
Amazon SageMaker is a fully integrated machine learning(ML) service to train, compile and deployment of machine learning models. Amazon SageMaker reduces the complexity at different stages and provides an optimal solution for an ML task. Amazon SageMaker helps to train models with large amounts of data, find the right parameters using hyperparameter tuning, and compile the model to run anywhere in cloud and edge.
Amazon SageMaker services are:
Easy access to the data
Ground truth labeling
Training and tuning of ML models
Deploy models to production
ML using Jupyter Notebooks
Supports deep learning frameworks Tensorflow, Keras, PyTorch, MXNet
SageMaker functionalities are obtained from high-level Amazon SageMaker Python SDK or Amazon Python boto3. Boto3 contains modules for different AWS services. Amazon SageMaker Python SDK is an open-source library for training and deploying machine-learned models on Amazon SageMaker. Hence it is easy to use with Sagemaker Notebook Instance and recommended for beginners.
SageMaker notebook instance is a fully managed ML virtual machine capable of running the Jupyter Notebook App. It contains related resources for processing data, training, and deploying ML models. The training followed by deploying or neo compilation can be done together in a Jupyter Notebook. Also, the notebook instance supports code management using repositories, allowing access to code from an external device.
A model deployment on an edge device using a SageMaker Notebook instance includes the following steps:
SageMaker Notebook Instance
In this example, we deploy an object detection model on a Jetson Nano. The model detects the persons with mask and without mask. We train and compile the model using Amazon SageMaker, then we deploy our model on Jetson Nano.
Create an AWS account if there exists none. One can create an AWS account for free. The new user has access to AWS free tier service for the first 12 months. For more information, please visit AWS Free Tier.
How to create a new account - Create and deactivate an AWS account.
Then sign-in to the AWS console AWS Sign-in.
Storage space for training data and artifacts is vital for training a model in Amazon SageMaker. We use Amazon S3 buckets for storage due to its high performance and integration with other AWS services. S3 bucket is the cloud storage service provided by AWS. The training data and the compiled model are stored in the S3 bucket. S3 buckets can be created from the console.
Select the same region for the S3 bucket and the SageMaker notebook instance otherwise it will throw an error. Regions are physically isolated zones. The resources in a region cannot be replicated unless we specify it. AWS Region Concept. Certain AWS services are region-based AWS region based services.
Please follow the steps on the page Amazon S3 Bucket to create an S3 bucket.
We can reach Amazon SageMaker from the AWS Management Console. Click on Services and search for Sagemaker. Choose the option Amazon SageMaker.
or open Amazon SageMaker directly from the link Amazon SageMaker Console.
Pin the required resources to the AWS Services tab for future use. Click on the Pin symbol on the tab and drag the resources to the tab.
How to select the desired region.
Stop the notebook instance when you are not using it. AWS charges for notebook instances while they are in service. The instructions for stopping a notebook instance are explained in the document.
Follow the steps to create a notebook instance. The following notebook instance is created in the Frankfurt region.
From the SageMaker dashboard, choose Notebook instances.
Next, create a notebook instance
Define the notebook instance by giving the name, the instance type, and permissions.
Permissions by IAM Role- Identity and Access Management (IAM Role ) Service provides access to the AWS services.
The permissions to different resources can be restricted using the IAM role. The IAM role used for the SageMaker notebook instance should have access to SageMaker and S3 buckets. How Amazon SageMaker works with IAM roles
In a notebook instance, we can use the existing IAM role or a new IAM role can be created.
Create a new IAM role, if there exist no IAM roles. Suppose if there exist predefined IAM roles, select the IAM role according to the S3 bucket access. Move to Option 2, if you would like to use an existing IAM role in your account.
Option 1 - Create a new IAM Role
Select Permissions and Encryption → IAM role → Create an IAM role.
There are two options for the S3 buckets selection. Any S3 Bucket allows access to all the S3 buckets in your account and recommended for a beginner. Specific buckets (optional) provide access to the mentioned buckets. The buckets which are not on the list are not accessible from this notebook instance. This ensures data safety. Both options are illustrated below. Select one of the options.
Any S3 bucket option
Specific S3 bucket option
Example of Specific buckets option - There exist no formats for bucket names. So, enter the name of the bucket in the box. Multiple buckets can be added, separated by a comma. The S3 bucket solkit-images contains the images for mask detection.
Finally, click on Create role.
Option 2 - Use an existing role
Select Permissions and Encryption → IAM role → Use existing role.
Select an IAM role from the list. Possibly, existing IAM roles could access certain S3 buckets only.
Check the existing AmazonSageMaker-ExecutionRoles. For this, go to IAM Console*. * Select the desired role. A sample SageMaker Role is shown below. We can view the existing policies for this role. Click on the arrow symbol near to the policy name to view the policy. IAM example policies. IAM role can be created and the Role ARN can be used for AWS Service - How to create an IAM role.
So, either select the role according to the specific bucket or change the bucket access permission by the IAM role.
Encryption keys are optional. It is a security service provided by AWS.
Networks, Git, and Tags are optional. https://imaginghub.com/projects/447-additional-configurations-for-sagemaker-notebook-instance explains these options.
Finally, click on Create notebook instance.
From the status, we can check the state of a notebook instance. After a few seconds, we can see the new notebook instance on the SageMaker Notebook Instances with status InService.
Stop the notebook instance when not in use.
How to stop a notebook instance - When you are finished with the work, go to Amazon SageMaker → Notebook instances. Select the instance by clicking on the round button and click on Stop option in the Actions menu. Similarly, the Stop option saves the contents in the notebook instance.
Add a lifecycle configuration to the notebook so that notebook instance stops when Jupyterlab is idle.
How to delete a notebook instance- Delete deletes the notebook instance and its contents. Download the necessary data from the instance before deleting it.
How to start a notebook instance- We can restart a stopped notebook instance. Select the desired notebook instance and click on Actions → Start.
Next, we have to perform the ML task using the Jupyter notebook. Jupyter Notebook encapsulates documentation and coding.
Select the notebook instance. Now open it by clicking on Open Jupyter/Open Jupyterlab of the respective notebook instance or select the instance, go to the Actions and click on Open Jupyter/Open Jupyterlab.
Choose Jupyter or Jupyterlab according to your preference.
Among the two options, a suggestion would be to use the Open Jupyterlab option, since it is versatile. A comparison between the two options is given below.
|Open Jupyter||Open JupyterLab|
The next step is the preparation of the dataset. The dataset contains images with persons with and without a mask. In this training example provided here, the dataset is labeled. The task is object detection, so labeling contains the bounding boxes and classes. AWS Ground Truth was used for labeling. AWS Ground Truth saves the labels in a .manifest file.
Skip this step, if you would like to use the existing labeled dataset for this example.
Mask detection images can be downloaded from the S3 bucket solkit-images. If you would like to download the images to your PC, please follow the steps below.
Open terminal in your PC
If you do not have awscli installed in your PC
Enter the command to copy data to your PC - aws s3 cp --recursive s3://solkit-images ****
In order to use your own dataset, collect the data, and then label the data. Data labeling can be done manually or by a paid service. AWS Ground Truth is a paid service for data labeling. If you would like to label the data yourself, please follow the steps in Data Labeling.
We use Amazon SageMaker Python SDK to train and compile models. Using Amazon SageMaker Python SDK, we can train and deploy models using popular deep learning frameworks, Amazon built-in algorithms, or our own algorithms built into SageMaker. We use the Estimator (SageMaker Python SDK - Estimator) class to train, compile and deploy models. The training algorithm or code is supplied to Estimator - How to train a model using Amazon SageMaker Python SDK.
The different options for training code or algorithm are
Amazon SageMaker algorithms - Amazon SageMaker has built-in algorithms for training different models and inference based on the ML task Amazon SageMaker built-in algorithms - Documentation.
Use custom code to train with deep learning frameworks - Amazon SageMaker Python SDK with deep learning libraries such as Tensorflow, MXNet, Pytorch perform the training task using the estimators and compilation using models SageMaker pre-built-containers for deep learning frameworks - Documentation. (This method is used here)
Use algorithms and model packages in AWS Marketplace - Find Models and Packages in AWS Marketplace - Documentation.
Use your own algorithms - Create your own algorithm - Documentation.
The initial step in training a model is notebook preparation. Download the files from this link here, Mask detection. The files are notebook(object-detection-mask.ipynb file), training code(train_mask_detection.py) and data labels(labels.manifest).
Upload the notebook and files to Jupyterlab.
For more information about Jupyterlab and working with notebooks, please refer How to work with notebooks in Jupyterlab
Open notebook. Then a dialogue box appears for the kernel selection. Select the kernel conda_mxnet_py36.
The notebook contains the documentation and code cells. The documentation contains the general information and steps that should be followed to run the cell. Run the cells.
How to run a cell
Select the cell by clicking it and Run the cell by clicking the run button.
During code execution [*] appears and a number  appears once the execution is complete.
The important steps in training and compiling the mas detection model for an edge device is explained here.
Initially, import necessary python modules, Eg. for the framework (MXNet, GluonCV), matrix operations(Numpy). Install the libraries if not present.
Enter the name of your S3 bucket where your data and labels have to be stored. Set the desired S3 bucket as the default bucket. In this example, 'evs-smk' is the default bucket.
Create a sagemaker session for S3 bucket access and get the IAM role to run the training job. Give the path to training images and labels in your S3 bucket.
Skip this step, if you are using your own dataset. Otherwise, download the training data for mask detection. You can download the entire dataset or a part of it to your S3 bucket folder. Set COPY_ENTIRE_DATA to True for downloading complete data or set IMAGES to get a part of data.
Next download data.
Upload .manifest file to S3 bucket labels data folder.
Create an MXNet Estimator class instance (Amazon SageMaker Python SDK - MXNet Estimator Class). Set the hyperparameters for learning, role, number of EC2 instances, EC2 instance type for training.
The assigned parameters in the MXNet() are
EC2 instance type - For object detection, GPU instances are preferred. i.e.,
ml.p2.xlarge, ml.p2.8xlarge, ml.p2.16xlarge, ml.p3.2xlarge, ml.p3.8xlarge and ml.p3.16xlarge.
MXNet framework version
Hyperparameters - Epochs, Data shape, Learning rate, Learning rate decay, and Batch size.
Set the train data path and label path. Make sure the training data and labels are uploaded to the S3 bucket. Give the path to your training data and label file. Here the labels and training data are in the same bucket folder 's3://evs-smk/training-models/datasets/mask-detection/’. Train the model by calling fit().
Certain users may not have access to some EC2 instances. This will cause an error and training cannot proceed. Request for EC2 instances if you do not have access to it. Certain user accounts cannot use GPU instances unless they request it.
My Service Quotas → Amazon Elastic Cloud Compute(EC2) → Search for instance type(eg type p2 in Search column ) → Request quota increase
The current status of the training can be viewed from Amazon SageMaker → Training → Training jobs
A list of training jobs appears. The running job has the status In Progress. Suppose if you would like to stop the job, then select Training job name→ Stop.
In order to view the logs or monitor the hardware usage, select the job.
Compile the trained model. Here the target device is Jetson Nano. Model compilation can be performed using the Estimator class.
The sample notebook contains the SageMaker Neo compilation using the Estimator class. SageMaker Neo compiles the model for optimal performance on the edge device. Assign the target device, input shape, and framework. Finally, run the cell
The neo compilation job status can be viewed as given below.
The compilation job details can be seen by selecting the specific job name.
Next, download the compiled model to the notebook instance and add classes to the model. The classes.lst file is added to the compiled model.
'mask-det-model.tar.gz' is our final model. Upload this model to the desired S3 bucket for your future use.
The model is available in the folder s3://evs-smk/jetson-nano-models/mask-det-model.tar.gz.
Use this model on your Jetson Nano for inference.
There are two ways to deploy the compiled model to the AI Vision Solution Kit:
Initially, edit the Public access settings of the bucket. Enter the bucket and edit the Permissions →Block public access section. Uncheck the options as shown below and finally, confirm.
Make the compiled model (.tar.gz file in S3) publicly available. For that click the model name in S3 and edit the Access control list. Select Public Access Everyone and check Read object. Save changes
Use Object URL for the AI Vision Solution Kit for Step-9.
Download the compiled model to your local PC and copy it to the Jetson Nano. If you are using Linux you can use the following command in a terminal:
scp <model-tar-gz-file> <user>@<ip-address-jetson-nano>:
(on windows use a program like pscp.exe from putty: https://www.chiark.greenend.org.uk/~sgtatham/putty/latest.html )
On the Jetson Nano extract the model archive:
mkdir model tar xvf <model-tar-gz-file> -C model
(to get onto the jetson nano you can use ssh in linux or putty.exe in windows.)
To use the model with the AI Vision Solution Kit you have to add it as processing unit as final step. To start the process of adding a new processing unit use the following command:
ai-vision-solution-kit processing-unit add <unique-processing-unit-id>
The script will ask you for a Name, Docker Image (optional), Environment (optional) and a model location (either URL or Directory):
After adding the new processing unit you can reload the web frontend and your model will be available in the selection of the processing unit.