How to create, train and deploy a Machine Learning Model?

Create and deploy your Machine Learning Model with Amazon SageMaker

Documentation

Amazon SageMaker is a fully integrated machine learning(ML) service to train, compile and deployment of machine learning models. Amazon SageMaker reduces the complexity at different stages and provides an optimal solution for an ML task. Amazon SageMaker helps to train models with large amounts of data, find the right parameters using hyperparameter tuning, and compile the model to run anywhere in cloud and edge.

ml_steps

Amazon SageMaker services are:

  • Easy access to the data

  • Ground truth labeling

  • Training and tuning of ML models

  • Deploy models to production

  • ML using Jupyter Notebooks

  • Supports deep learning frameworks Tensorflow, Keras, PyTorch, MXNet

SageMaker functionalities are obtained from high-level Amazon SageMaker Python SDK or Amazon Python boto3. Boto3 contains modules for different AWS services. Amazon SageMaker Python SDK is an open-source library for training and deploying machine-learned models on Amazon SageMaker. Hence it is easy to use with Sagemaker Notebook Instance and recommended for beginners.

Amazon SageMaker Notebook Instance

SageMaker notebook instance is a fully managed ML virtual machine capable of running the Jupyter Notebook App. It contains related resources for processing data, training, and deploying ML models. The training followed by deploying or neo compilation can be done together in a Jupyter Notebook. Also, the notebook instance supports code management using repositories, allowing access to code from an external device.

sagemaker-summary

A model deployment on an edge device using a SageMaker Notebook instance includes the following steps:

  1. AWS Account

  2. S3 Bucket

  3. SageMaker Notebook Instance

  4. Jupyter Notebook

  5. Dataset

  6. Train model

  7. Compile model

  8. Deploy model

In this example, we deploy an object detection model on a Jetson Nano. The model detects the persons with mask and without mask. We train and compile the model using Amazon SageMaker, then we deploy our model on Jetson Nano.

info The user should create an IAM user with root user credentials. Please follow the steps in the website Create IAM User. For more information about the user account, Root User vs IAM User.

iamusernote

Step -1: AWS Account

Create an AWS account if there exists none. One can create an AWS account for free. The new user has access to AWS free tier service for the first 12 months. For more information, please visit AWS Free Tier.

How to create a new account - Create and deactivate an AWS account.

Then sign-in to the AWS console AWS Sign-in.

Step -2: S3 Bucket

Storage space for training data and artifacts is vital for training a model in Amazon SageMaker. We use Amazon S3 buckets for storage due to its high performance and integration with other AWS services. S3 bucket is the cloud storage service provided by AWS. The training data and the compiled model are stored in the S3 bucket. S3 buckets can be created from the console.

infoSelect the same region for the S3 bucket and the SageMaker notebook instance otherwise it will throw an error. Regions are physically isolated zones. The resources in a region cannot be replicated unless we specify it. AWS Region Concept. Certain AWS services are region-based AWS region based services.

Please follow the steps on the page Amazon S3 Bucket to create an S3 bucket.

Step -3: Create a Notebook Instance

We can reach Amazon SageMaker from the AWS Management Console. Click on Services and search for Sagemaker. Choose the option Amazon SageMaker.

open_sagemaker

or open Amazon SageMaker directly from the link Amazon SageMaker Console.

check Pin the required resources to the AWS Services tab for future use. Click on the Pin symbol on the tab and drag the resources to the tab.

pin_service

infoHow to select the desired region.

region

warning Stop the notebook instance when you are not using it. AWS charges for notebook instances while they are in service. The instructions for stopping a notebook instance are explained in the document.

Follow the steps to create a notebook instance. The following notebook instance is created in the Frankfurt region.

  • From the SageMaker dashboard, choose Notebook instances.

    notebook_instance

  • Next, create a notebook instance

create_notebookinstance

  • Define the notebook instance by giving the name, the instance type, and permissions.

    notebookinstance_step1

    • Notebook name- Unique name for notebook instance in an AWS region
    • Instance type- The sample notebook instance uses ml.t2.medium. The notebook instances start with an EC2 (Elastic Compute Cloud) instance. Each EC2 instance comprises of different combinations of CPU, memory, storage, and networking. We can select the instance type according to our problem. ml.t2.medium is the least expensive instance type.
      AWS EC2 instance - Documentation
      AWS EC2 instance types
      EC2 instance pricing
    • (Optional - Only for regions with option Elastic inference. Currently, this service is not available in Frankfurt region) Elastic inference(EI)- Elastic inference helps in the optimal GPU usage for inference. Elastic inference attaches the right amount of GPU to the endpoint for the inference. The cost is based on GPU usage thus making it cost-efficient. Select EI if the inference has to be made from the notebook instance else select None. Here our target device is Jetson Nano, so select None.
      AWS Elastic inference types
      AWS Elastic Inference pricing
    • Additional configuration- EBS volume and Lifecycle configuration are optional. EBS volume enables data storage in a notebook instance. We can attach an EBS volume of the required size to our EC2 instance. Decide the size based on the data storage requirements and specify the EBS volume size in GB. In this example, an EBS volume of 5GB is used.
      Lifecycle configuration is optional. Additional Configurations for SageMaker Notebook Instance explains the lifecycle configuration.
  • Permissions by IAM Role- Identity and Access Management (IAM Role ) Service provides access to the AWS services.
    The permissions to different resources can be restricted using the IAM role. The IAM role used for the SageMaker notebook instance should have access to SageMaker and S3 buckets. How Amazon SageMaker works with IAM roles
    In a notebook instance, we can use the existing IAM role or a new IAM role can be created.

    permissions

    Create a new IAM role, if there exist no IAM roles. Suppose if there exist predefined IAM roles, select the IAM role according to the S3 bucket access. Move to Option 2, if you would like to use an existing IAM role in your account.

    • Option 1 - Create a new IAM Role
      Select Permissions and Encryption → IAM role → Create an IAM role.

      There are two options for the S3 buckets selection. Any S3 Bucket allows access to all the S3 buckets in your account and recommended for a beginner. Specific buckets (optional) provide access to the mentioned buckets. The buckets which are not on the list are not accessible from this notebook instance. This ensures data safety. Both options are illustrated below. Select one of the options.

      Any S3 bucket option create_iam_rule

      Specific S3 bucket option specific_s3

      info Example of Specific buckets option - There exist no formats for bucket names. So, enter the name of the bucket in the box. Multiple buckets can be added, separated by a comma. The S3 bucket solkit-images contains the images for mask detection.

      Finally, click on Create role.

    • Option 2 - Use an existing role
      Select Permissions and Encryption → IAM roleUse existing role.
      Select an IAM role from the list. Possibly, existing IAM roles could access certain S3 buckets only.

      info Check the existing AmazonSageMaker-ExecutionRoles. For this, go to IAM Console*. * Select the desired role. A sample SageMaker Role is shown below. We can view the existing policies for this role. Click on the arrow symbol near to the policy name to view the policy. IAM example policies. IAM role can be created and the Role ARN can be used for AWS Service - How to create an IAM role.

      iam_policy

      So, either select the role according to the specific bucket or change the bucket access permission by the IAM role.

      bucket_selection

    • Encryption keys are optional. It is a security service provided by AWS.

  • Networks, Git, and Tags are optional. https://imaginghub.com/projects/447-additional-configurations-for-sagemaker-notebook-instance explains these options.

  • Finally, click on Create notebook instance.

    create_instance

From the status, we can check the state of a notebook instance. After a few seconds, we can see the new notebook instance on the SageMaker Notebook Instances with status InService.

notebook_status

warning Stop the notebook instance when not in use.
How to stop a notebook instance - When you are finished with the work, go to Amazon SageMaker → Notebook instances. Select the instance by clicking on the round button and click on Stop option in the Actions menu. Similarly, the Stop option saves the contents in the notebook instance.

stop_notebook

check Add a lifecycle configuration to the notebook so that notebook instance stops when Jupyterlab is idle.

Additional Configurations for SageMaker Notebook Instance

info How to delete a notebook instance- Delete deletes the notebook instance and its contents. Download the necessary data from the instance before deleting it.

How to start a notebook instance- We can restart a stopped notebook instance. Select the desired notebook instance and click on Actions → Start.

notebook_status

Step -4: Jupyter Notebook

Next, we have to perform the ML task using the Jupyter notebook. Jupyter Notebook encapsulates documentation and coding.

notebook_status

Select the notebook instance. Now open it by clicking on Open Jupyter/Open Jupyterlab of the respective notebook instance or select the instance, go to the Actions and click on Open Jupyter/Open Jupyterlab.

Choose Jupyter or Jupyterlab according to your preference.

infoGet familiar with Jupyterlab - How to work with notebooks in Jupyterlab. To familiarise with Jupyter, see Jupyter tutorial - Documentation

Among the two options, a suggestion would be to use the Open Jupyterlab option, since it is versatile. A comparison between the two options is given below.

Open Jupyter Open JupyterLab
jupyter jupyterlab

Step-5: Prepare dataset

The next step is the preparation of the dataset. The dataset contains images with persons with and without a mask. In this training example provided here, the dataset is labeled. The task is object detection, so labeling contains the bounding boxes and classes. AWS Ground Truth was used for labeling. AWS Ground Truth saves the labels in a .manifest file.

Skip this step, if you would like to use the existing labeled dataset for this example.

Mask detection images can be downloaded from the S3 bucket solkit-images. If you would like to download the images to your PC, please follow the steps below.

  • Open terminal in your PC

  • If you do not have awscli installed in your PC

  • Enter the command to copy data to your PC - aws s3 cp --recursive s3://solkit-images ****

In order to use your own dataset, collect the data, and then label the data. Data labeling can be done manually or by a paid service. AWS Ground Truth is a paid service for data labeling. If you would like to label the data yourself, please follow the steps in Data Labeling.

Step-6: Train model

We use Amazon SageMaker Python SDK to train and compile models. Using Amazon SageMaker Python SDK, we can train and deploy models using popular deep learning frameworks, Amazon built-in algorithms, or our own algorithms built into SageMaker. We use the Estimator (SageMaker Python SDK - Estimator) class to train, compile and deploy models. The training algorithm or code is supplied to Estimator - How to train a model using Amazon SageMaker Python SDK.

info The different options for training code or algorithm are

The initial step in training a model is notebook preparation. Download the files from this link here, Mask detection. The files are notebook(object-detection-mask.ipynb file), training code(train_mask_detection.py) and data labels(labels.manifest).

  1. Upload the notebook and files to Jupyterlab.

    upload

    For more information about Jupyterlab and working with notebooks, please refer How to work with notebooks in Jupyterlab

  2. Open notebook. Then a dialogue box appears for the kernel selection. Select the kernel conda_mxnet_py36.

    kernel_list

    The notebook contains the documentation and code cells. The documentation contains the general information and steps that should be followed to run the cell. Run the cells.

    info How to run a cell
    Select the cell by clicking it and Run the cell by clicking the run button.

    runcell

    During code execution [*] appears and a number [] appears once the execution is complete.

    The important steps in training and compiling the mas detection model for an edge device is explained here.

    1. Initially, import necessary python modules, Eg. for the framework (MXNet, GluonCV), matrix operations(Numpy). Install the libraries if not present.

      import_modules

    2. Enter the name of your S3 bucket where your data and labels have to be stored. Set the desired S3 bucket as the default bucket. In this example, 'evs-smk' is the default bucket.

      set_defaultbucket

    3. Create a sagemaker session for S3 bucket access and get the IAM role to run the training job. Give the path to training images and labels in your S3 bucket.

      label_path

    4. Skip this step, if you are using your own dataset. Otherwise, download the training data for mask detection. You can download the entire dataset or a part of it to your S3 bucket folder. Set COPY_ENTIRE_DATA to True for downloading complete data or set IMAGES to get a part of data.

      copy_S3data

      Next download data.

      download_traindata

    5. Upload .manifest file to S3 bucket labels data folder.

      upload_manifest

    6. Create an MXNet Estimator class instance (Amazon SageMaker Python SDK - MXNet Estimator Class). Set the hyperparameters for learning, role, number of EC2 instances, EC2 instance type for training. The assigned parameters in the MXNet() are

      1. Python code for training
      2. S3 location to store the output
      3. IAM role
      4. Number of EC2 instances for training
      5. EC2 instance type - For object detection, GPU instances are preferred.
      6. MXNet framework version
      7. Python version
      8. Hyperparameters - Epochs, Data shape, Learning rate, Learning rate decay, and Batch size.

        estimator

    7. Set the train data path and label path. Make sure the training data and labels are uploaded to the S3 bucket. Give the path to your training data and label file. Here the labels and training data are in the same bucket folder 's3://evs-smk/training-models/datasets/mask-detection/’. Train the model by calling fit().
      fit

      warningCertain users may not have access to some EC2 instances. This will cause an error and training cannot proceed. Request for EC2 instances if you do not have access to it. Certain user accounts cannot use GPU instances unless they request it.

      My Service Quotas → Amazon Elastic Cloud Compute(EC2) → Search for instance type(eg type p2 in Search column ) → Request quota increase

      service_quota

      The current status of the training can be viewed from Amazon SageMaker → Training → Training jobs

      training_jobs

      A list of training jobs appears. The running job has the status In Progress. Suppose if you would like to stop the job, then select Training job name→ Stop.

      train-jobs

      In order to view the logs or monitor the hardware usage, select the job.

      train-job

Step-7: Compile model

Compile the trained model. Here the edge device is Jetson Nano., so the model is compiled for the arm platform with compiler options. Model compilation can be performed using the Estimator class.

The sample notebook contains the SageMaker Neo compilation using the Estimator class. SageMaker Neo compiles the model for optimal performance on the edge device. Assign the target platform features, input shape, and framework. Finally, run the cell.

compile_model

The neo compilation job status can be viewed as given below.

compilation_jobs

The compilation job details can be seen by selecting the specific job name.

select_compilation_job

Next, download the compiled model to the notebook instance and add classes to the model. The classes.lst file is added to the compiled model.

add_classes

'mask-det-model.tar.gz' is our final model. Upload this model to the desired S3 bucket for your future use.

upload_finalmodel

The model is available in the folder s3://evs-smk/jetson-nano-models/mask-det-model.tar.gz.

finalmodel

Use this model on your Jetson Nano for inference.

Step-8: Prepare Model Deployment

There are three ways to deploy the compiled model to the AI Vision Solution Kit:

  • Making the S3 object public and download it via the AI Vision Solution Kit
  • Use a private S3 bucket and download it via the AI Vision Solution Kit
  • Download the model manually and copy it to the board

Method 1 - Public S3 Object

Initially, edit the Public access settings of the bucket. Enter the bucket and edit the Permissions →Block public access section. Uncheck the options as shown below and finally, confirm.

bucket-privacy

Make the compiled model (.tar.gz file in S3) publicly available. For that click the model name in S3 and edit the Access control list. Select Public Access Everyone and check Read object. Save changes

public-model

Use Object URL for the AI Vision Solution Kit for Step-9.

Method 2 - Use a private S3 bucket

Create a new private S3 bucket or use an existing one. Upload your model .tar.gz-file to the bucket.

If you did not already use AWS connection with the board, create a new IAM user for the board and generate access keys for it (see [https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html#Using_CreateAccessKey]). Save the access key id and the secret access key to a secure location. If you already connected your board with an IAM user, you only need to attach the correct permissions as described below.

Grant the IAM user permissions to download the model from the S3 bucket. At least "s3:GetObject" for the model .tar.gz file must be assigned to this user.

Then, please execute ai-vision-solution-kit (without arguments) for configuring the access keys. It now asks to enter the access key, the secret access key and the region. Enter the keys from the IAM user you just created. For the region, please enter the region your S3 bucket is in.

If you already configured the access keys using an earlier version of the AI Vision Solution Kit, the command might ask you for the region only.

Please copy the S3 URI for the model (beginning with s3://) and save it somewhere because it is needed for Step 9.

Method 3 - Manually Copying to AI Vision Solution Kit

Download the compiled model to your local PC and copy it to the Jetson Nano. If you are using Linux you can use the following command in a terminal:

scp <model-tar-gz-file> <user>@<ip-address-jetson-nano>:

(on windows use a program like pscp.exe from putty: https://www.chiark.greenend.org.uk/~sgtatham/putty/latest.html )

On the Jetson Nano extract the model archive:

mkdir model
tar xvf <model-tar-gz-file> -C model

(to get onto the jetson nano you can use ssh in linux or putty.exe in windows.)

Step-9: Add the Model as New Processing Unit to the AI Vision Solution Kit

To use the model with the AI Vision Solution Kit you have to add it as processing unit as final step. To start the process of adding a new processing unit use the following command:

ai-vision-solution-kit processing-unit add <unique-processing-unit-id>

The script will ask you for a Name, Docker Image (optional), Environment (optional) and a model location (public/private URL or Directory):

  • Name: This will be the name visible in the dropdown menu where the processing unit running on the AI Vision Solution Kit is selected.
  • Docker Image: use the default image-processing
  • Environment: use the default IMAGEPROCESSINGMODE=NEO_AI
  • URL: Public or private URL to the compiled model as S3 object (see Step-8 Public S3 Object and Step-8 Private S3 Bucket) - private URLs start with s3://, public URLs with https://
  • Directory: directory where you extracted the model on the Jetson Nano board (see Step-8 Manually Copying to AI Vision Solution Kit)

After adding the new processing unit you can reload the web frontend and your model will be available in the selection of the processing unit.

References

Amazon SageMaker videos

AWS Hands-on tutorial

Info

Project State

Public Project

Licences

Software Licence: Project has no software
Hardware Licence: Project has no hardware

Project Tags

Admins

SelenaS
emeusel
Nina_Boehm
MeyerMel
TimM
nwilson
jstoltz
kilian-hohm
TomE
abd
ath

Members

Does this project pique your interest?

Login or register to join or follow this project.

Comments
Back to top

Ready to join the project?

You'd like to participate ... Show more