Deploying a TensorFlow model on a Jetson Nano using TensorFlow serving and K3s

Deploying a TensorFlow model on a Jetson Nano using TensorFlow serving and K3s

The Nvidia Jetson Nano constitutes a low cost platform for AI applications, ideal for edge computing.However, due to the architecture of its CPU, deploying applications to the SBC can be challenging. In this guide, we'll install and configure K3s, a lightweight kubernetes distribution made specifically for edge devices. Once done we'll build and deploy an TensorFlow model in the K3s cluster.

Environment preparation

Our objective is to deploy a TensorFlow Serving container in a Kubernetes cluster running on the Jetson Nano. Moreover, this container should fully take advantage of the CUDA capabilities of the Nano. Consequently, some preliminary environment preparation is necessary, namely .The following is based on this guide. Additionally this article assumes that the reader has a Docker container registry available to push to and pull from.

Docker configuration

JetPack, the default Linux distribution that comes with the Jetson Nano, ships with Docker preinstalled. However, Docker must be configured to take advantage of the Jetson Nano's GPU capabilities. This can be done by editing the /etc/docker/daemon.json file. Moreover, if we intend to use a private docker registry served over HTTP, the address of the latter must be specified in the same file. Here is how the file should look like after modifications:

{
    "default-runtime": "nvidia",
    "runtimes": {
        "nvidia": {
            "path": "nvidia-container-runtime",
            "runtimeArgs": []
        }
    },
    "insecure-registries" : ["192.168.1.2:5000"]
}

Here, 192.168.1.2:5000 is to be replaced by the URL of your registry.

Those changes can be applied by restarting docker:

sudo systemctl restart docker

K3s installation

The objective is to deploy an AI model to a Kubernetes cluster running on the Jetson Nano. However, due to the architecture of the latter, most distributions of Kubernetes like Microk8s cannot run on it. This is why we are going to use K3s, a Kubernetes distribution made specifically for edge devices. K3s can be installed as so:

curl -sfL https://get.k3s.io | sh -

By default, K3s uses containerd as container engine. However, here, we would prefer to use Docker since we've configured it to fit our needs in the previous step. This can be done by editing the /etc/systemd/system/k3s.service file and adding the --docker flag to he ExecStart line as follows:

ExecStart=/usr/local/bin/k3s \
    server \
    --docker

K3s must then be restarted using

sudo systemctl restart k3s

Futher details in the official documentation.

To check if K3s properly uses the Jetson Nano's GPU, one can deploy the following container in it:

sudo kubectl run -i -t gpu-check --image=jitteam/devicequery --restart=Never

TensorFlow Serving

First, we are going to need a model to deploy. Here is that of the TensorFlow getting started example, which classifies images from the MNIST dataset:

# Importing TensorFlow
import tensorflow as tf

# Loading the data
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# Data preprocessing (here, normalization)
x_train, x_test = x_train / 255.0, x_test / 255.0

# Building the model
model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10)
])

# Loss function declaration
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)

# Model compilation
model.compile(optimizer='adam',
              loss=loss_fn,
              metrics=['accuracy'])

# Training
model.fit(x_train, y_train, epochs=5)

# Exporting
model.save('./mymodel/1/')

Running this snippet should generate a folder called mymodel in the working directory, containing our exported model. Thus, the next step is to embed themodel in a TensorFlow Serving container.

The official TensorFlow serving image is unfortunately incompatible with the Jetson Nano's architecture. However, Docker Hub user helmuthva provided this alternative which works nicely. it can be used just like an usual TensorFlow Serving image:

docker run -d --name serving_base helmuthva/jetson-nano-tensorflow-serving
docker cp ./mymodel serving_base:/models/mymodel
docker commit --change "ENV MODEL_NAME mymodel" serving_base my-registry/mymodel-serving

The container image can then be pushed to our registry, making it available for Kubernetes to deploy:

docker push my-registry/mymodel-serving

Kubernetes

Now that an image of our TensorFlow serving container is available in our container registry, it is time to pull it in the Kubernetes cluster. to do so, we create a Kubernetes manifest file, for example kubernetes_manifest.yml, with the following content:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: mymodel-serving
spec:
  replicas: 1
  selector:
    matchLabels:
      app: mymodel-serving
  template:
    metadata:
      labels:
        app: mymodel-serving
    spec:
      containers:
      - name: mymodel-serving
        image: my-registry/mymodel-serving
        ports:
        - containerPort: 8501
---
apiVersion: v1
kind: Service
metadata:
  name: mymodel-serving
spec:
  ports:
  - port: 8501
    nodePort: 30111
  selector:
    app: mymodel-serving
  type: NodePort

Those resources can be created using kubectl apply

kubectl apply -f kubernetes_manifest.yml

The model should now be deployed to the Kubernetes cluster. This can be verified by pointing a web browser to http://<Kubernetes cluster IP>/v1/models/mymodel, which should yield the following JSON:

{
 "model_version_status": [
  {
   "version": "1",
   "state": "AVAILABLE",
   "status": {
    "error_code": "OK",
    "error_message": ""
   }
  }
 ]
}