Deployment of a TensorFlow model to Kubernetes

Deployment of a TensorFlow model to Kubernetes

Let’s imagine that you’ve just finished training your new TensorFlow model and want to start using it in your application(s). One obvious way to do so is to simply import it in the source code of every application that uses it. However, it might be more versatile to keep your model in one place as standalone and simply have applications exchange data with it through API calls. This article will go through the steps of building such a system and deploy the result to Kubernetes.

This guide is based on the official TensorFlow Serving documentation

The TensorFlow Model

First, we need a model to work with. For this purpose, here is a snippet taken from the TensorFlow getting started guide that deals with the building and training of a simple Keras model:

# Data loading, preprocessing etc...

# Building the model
model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10)
])

# Compiling the model
model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])

# Train the model
model.fit(x_train, y_train, epochs=5)

The trained model can then be used within the code so as to make predictions. However, this model would me more useful if it could also be used by other applications, potentially deployed on different computers. To do so, the model can be exposed to HTTP request using a REST API. Although this could be done by using Flask to create an API endpoint for our model, Tensorflow Serving offers a similar solution without requiring complicated setup for developers.

Tensorflow Serving consists of a Docker container inside which the desired model is copied. The container contains the necesssary logic to expose the model to HTTP requests.

To use Tensorflow Serving, one must first export the model using the save function of Keras:

MODEL_NAME = 'mymodel'
MODEL_VERSION = 1
model.save('./{}/{}/'.format(MODEL_NAME, MODEL_VERSION )) 

Note how the model name and version are defined.

The model now exists in the folder ./mymodel/1/.

Preparing a TensorFlow Serving container

Now, an empty TensorFlow serving container can be pulled from Docker Hub and run locally:

docker run -d --name serving_base tensorflow/serving

With the container running, one can now copy the exported model into it using:

docker cp ./mymodel serving_base:/models/mymodel

The container now contains our model and can be saved as a new image. This can be done by using the docker commit command:

docker commit --change "ENV MODEL_NAME mymodel" serving_base my-registry/mymodel-serving

Note here that my-registry is the URL of the docker registry to push the image to.

Once done, we can get rid of the original TensworFlow serving image

docker kill serving_base
docker rm serving_base

Here, it might be a good idea to check if the container is actually working, so let's run it

docker run -d -p 8501:8510 my-registry/mymodel-serving

Note that 8501 is the port TensorFlow serving uses for its REST API.

A Get request to http://localhost:8501/v1/models/mymodel Should return the following JSON:

{
 "model_version_status": [
  {
   "version": "1",
   "state": "AVAILABLE",
   "status": {
    "error_code": "OK",
    "error_message": ""
   }
  }
 ]
}

If everything is successful so far, the container can be pushed to the registry, which will make it available for Kubernetes to deploy:

docker push my-registry/mymodel-serving

Deploying the container to Kubernetes

Now that the container has been pushed to a registry, it can be deployed to our Kubernetes cluster. This is achieved by creating two resources in the cluster: a deployment and a service. The deployment is basically the application itself while the service is here to allow users to reach the deployment from outside the cluster. Here, we will use a NodePort service so that our TensorFlow serving container can be accessed from outside the cluster by simply using a dedicated port. We will choose 30111 for this matter.

Creating those resources is done simply by applying the content of a YAML manifest file with the kubectl command. In our case, here is the content of our kubernetes_manifest.yml file:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: mymodel-serving
spec:
  replicas: 1
  selector:
    matchLabels:
      app: mymodel-serving
  template:
    metadata:
      labels:
        app: mymodel-serving
    spec:
      containers:
      - name: mymodel-serving
        image: my-registry/mymodel-serving
        ports:
        - containerPort: 8501
---
apiVersion: v1
kind: Service
metadata:
  name: mymodel-serving
spec:
  ports:
  - port: 8501
    nodePort: 30111
  selector:
    app: mymodel-serving
  type: NodePort

The resources can then be created by executing

kubectl apply -f kubernetes_manifest.yml

The container should now be deployed in the Kubernetes cluster

Using the container

The AI model deployed in Kubernetes can now be used for predictions. To do so, one must send a POST request to prediction API of the TensorFlow Serving container with a a body consisting of a JSON containing the input data. The model will then reply with its prediction, also in JSON format. Here is an example of how this can be implemented in Python, using the requests library

image = cv2.imread(image_path)
payload = json.dumps( {'instances': [image.tolist()]} )
r = requests.post("http://<IP of the Cluster>:30111/v1/models/mymodel:predict", data=payload)

Note here that the image data sent to the AI model is embedded in an array. This is because of the way TensorFlow serving accepts input data. In our case, the input image is 28x28 pixels so the data sent to the AI model must be have a (n,28,28) shape where n is the number of images to evaluate.