Blog Github

You probably don't need JupyterHub on Kubernetes.


To deploy Jupyter notebooks on Kubernetes using open-source software, currently there are two major approaches to choose from:

Make notebooks a core feature on Kubernetes

This is usually done using CRD to make Kubernetes treat a Notebook as it treats a Pod or a Secret. These CR are backed up by an Operator that is aware of notebooks management logic and that will be in charge of your notebooks based on configuration you provide.

While this approach is well-integrated with the Kubernetes ecosystem, it also adds complexity and a significant maintenance burden even for those familiar with Kubernetes. It is necessary to maintain the CRD and learn how to interact with the Operator.

The most familiar and well-maintained of these is the one Kubeflow provides, the problem is that you need to deploy many other components (if not the entire stack) to get access to the Operator, but again, even if we could I still think that we don't really need an extra Operator looking after our notebooks, Kubernetes can take care of them on its own.

Run JupyterHub on Kubernetes

JupyterHub has been around for a long time, and has been serving notebooks even before Kubernetes gained momentum, As more people started running applications inside Kubernetes, JupyterHub was complelled to run, as is, inside Kubernetes as well.

This approach avoids reiventing the wheel by utilizing purpose-built adapters such as Kubespawner and those in Zero-to-jupyterhub-k8s. As a result, individuals who are accustomed to managing notebooks outside of Kubernetes won't feel lost or disoriented.

The drawback of this approach is that it relies on "glue code" to connect JupyterHub with Kubernetes, which is considered hacky and introduces feature redundancy.

Enter notebook-on-kube

notebook-on-kube is a simple Python application based on FastAPI that:

notebook-on-kube leverages the existing features and tools that are designed to run applications on Kubernetes, providing a third, middle-ground approach that is easy to maintain and well-integrated for managing notebooks on Kubernetes. Give it a try!

The photo below illustrates the hardware equivalent of "glue code". Whether Jupyter represents the flash drive or the system unit in the photo is open to debate. However, it is widely acknowledged that it would be significantly more convenient if we could directly plug in the flash drive.

source: the Internet

The approach that I have demonstrated here can be extended to any other legacy software, with Kubernetes becoming the new Linux, let's make our applications Kubernetes friendly, particularly when the process is straightforward.