Mahesh Kumar

A guide to deploying Machine/Deep Learning model(s) in Production

Source: XKCD

There are a plethora of articles on Deep Learning (DL) or Machine Learning (ML) that cover topics like data gathering, data munging, network/algorithm selection, training, validation, and evaluation. But, one of the challenging problems in today’s data science is the deployment of the trained model in production for any consumer-centric organizations or individuals who want to make their solutions reach a wider audience.

Most of the time, energy and resources are spent on training the model to achieve the desired results, so allocating additional time and energy to decide on the computational resources to set up the appropriate infrastructure to replicate the model for achieving similar results in a different environment (production) at scale will be a difficult task. Overall, it’s a lengthy process that can easily take up months right from the decision to use DL to deploying the model.

Source: Algorithmia Source: Algorithmia


This article tries to give a comprehensive overview of the entire process of deployment from scratch.

Illustration of the workflow (from client API requests to server prediction responses) and you are free to use the image. Illustration of the workflow (from client API requests to server prediction responses). You are free to use the image.

Note: The above image is just an illustration of a probable architecture and used primarily for learning purpose.

Components

Let’s break down the above image that depicts the entire API workflow and understand every component.


Architecture Setup

By now you should be familiar with the components mentioned in the earlier sections. In the following section, Let’s understand the setup from an API perspective since this forms the base for a web application as well.

Note: This architecture setup will be based on Python.

Development Setup

  gunicorn --workers 1 --timeout 300 --bind 0.0.0.0:8000 api:app
  - workers (INT): The number of worker processes for handling requests.
  - timeout (INT): Workers silent for more than this many seconds are killed and restarted.
  - bind (ADDRESS): The socket to bind. [['127.0.0.1:8000']]
  - api: The main Python file containing the Flask application.
  - app: An instance of the Flask class in the main Python file 'api.py'.

Production Setup


Additional Setup (Add-ons)

Apart from the usual setup, there are few other things to take care of to make the setup self-sustaining for the long run.

Source: https://aws.amazon.com/devops/continuous-integration/ Source: AWS


Alternate platforms

There are other systems that provide a structured way to deploy and serve models in the production and few such systems are as follows:

TensorFlow Serving
Source: TensorFlow Serving

Docker
Source: Docker Architecture

Michelangelo
Source: Michelangelo

Additional Resources

I hope you found this article useful and understood the overview of the deployment process of Deep/Machine Learning models from development to production.

comments powered by Disqus