Tuesday, July 20, 2021

Azure @ Enterprise - Why I recommend Kubernetes as app hosting platform?

I started writing this post 2-3 years back. Mainly when Apache Spark 2.3 started supporting Kubernetes (K8s) in 2018. It was obvious that Kubernetes is taking over app hosting space the same way virtual machines took over physical machines. All are expected to understand where the industry is moving and adopt. Hence I paused this post as there is nothing I need to endorse. But it's time to resume this post and publish it.

If you are already using container and K8s orchestration, feel free to skip this post and save time.

Basics

Containers

Please refer to my 2015 post about Software Containerization via Docker - First look and some thoughts for very low-level basics. The container can be understood as a micro virtual machine with the application(s) and runtime isolated from each other. It has a file system inside it, it can expose ports for networking. Mount external storage.

This mainly co-evolved with the microservice architecture where the huge systems are separately developed and deployed as small services that own their own data and release cadence.

Kubernetes

Officially 
"Kubernetes is a portable, extensible, open-source platform for managing containerized workloads and services, that facilitates both declarative configuration and automation." - Kubernets.io

In short, it is the container orchestration engine.

What is and why do we need container orchestration?

 In my own words, orchestration is  
  • Where to run
    • Horizontal scaling - Mechanism to use multiple nodes(VMs are called nodes) to run containers of the same system. Nodes are expected to fail at any time.
    • Deciding where to run the container - The mechanism to specify what node executes what container. Eg: a particular container in the solution may need to run on the Linux node another may need a Windows node. Some containers may need GPU to be present on the node
    • Can also include virtual nodes that even can use Azure Container Instances.
  • Deployment and rollback
    • Ability to rollback easily in case something goes wrong in deployment
    • It is expected to have failures in deployments but the orchestration system should be able to roll back easily to the previous known state.
  • Networking
    • Determine how the containers are allowed to communicate with each other and what ports are exposed to outside.
    • Also has load balancing capability
    • Also has routing capability.
  • Service discovery
    • The services/components in the system should be able to discover the endpoints to other services as the services may be deployed to multiple nodes.
  • Resource allocation
    • Determines how much system resources need to be given to each container.
  • Storage mounting
    • Ability to mount external storage
  • Scaling
    • The mechanism to specify how many container instances are to be running at a time. Eg: Based on Azure queue messages it has to scale out / horizontal scale.
  • Secret and configuration management
    • Secrets can even be stored in Azure KeyVault and places like that. But that may not work on-prem. An orchestration engine needs to have secret management.
    • Configuration management is another aspect of any software system. The containers need to be initialized with proper configuration wherever it runs.
  • CRON
    • Capability to execute a container based on CRON expression.
  • Health monitoring & self-healing
    • Monitor the state of the containers and make sure the state is true to the definition of the system. Eg: If a container crashes, the orchestrator makes sure it's started again.
Hope the above explains what and why orchestration?

Links

There are people out there who can explain orchestration better than me.

Why Kubernetes?

It is the currently available best container orchestration engine. It supports most of the capabilities mentioned in the above list of what and why orchestration. For further reading refer to K8s documentation.

What Kubernetes is not?

It is equally important to understand what Kubernetes is not for. They
  • Don’t have native CI/CD concepts.
  • Don't have a data storage engine. RDBMS Rollbacks are not supported out of the box. Better be backward compatible. 
  • Don’t have a built-in blue-green deployment strategy
  • Don’t have a built-in serverless function or event-driven execution
  • Don’t have a built-in message broker, service bus, etc… we have to use external or run a container for message broker.
More in their documentation.

How and where to host

K8s can be run on top of virtual or physical machines or in the cloud.

On-premise

It can be installed on-premise on top of virtual machines or physical machines. The machines are called nodes as well. Control nodes need to be Linux, worker nodes can be either Windows or Linux. If your organization is a hard-core windows shop, no luck here.

Cloud

Major cloud providers provide an easier way to host K8s applications.

Why Kubernetes is good for enterprises?

  • Vendor-neutral and portable
    • If cloud providers increase fees, easy to switch from one vendor to other
    • Worst case start and maintain on-prem K8s cluster
    • Never do own cluster if you are a start-up and hosting a single application.
  • No vendor lock-in due to native PaaS model
  • Environment parity
    • Less 'it works in my machine' comments from developers
  • High density
    • Can pack more things in resources. Less wastage.
  • Multi-tenancy
    • Multiple apps of the enterprise can share a single cluster via namespaces-based isolation.
  • Fewer operations cost
    • Due to high density and multi-tenant nature less operation cost to make the cluster up and running.
  • A clear view of architecture
    • Just reviewing the YAML files will give a clear idea of how the system is designed. No more surprises because the settings and configurations are spread across many places.
  • Repeatability
    • Since the container is immutable, it can be used years later if we want to see how the app did run earlier.
  • No need for private cloud software
    • An on-premise cloud can be created on the K8s cluster without any private cloud software such as OpenStack. The K8s cluster is the cloud where applications are separated by namespaces. Oh yes, the storage needs to be managed.

Learning curve

It will not be an easy journey to the Kubernetes. Engineering teams (Dev and DevOps if you have separate teams) need to be upskilled as there are a lot of new concepts in K8s. Some below
  • Namespaces
  • Nodes - Control and worker nodes
  • Controllers - ReplicaController, NodeController
  • Deployment
  • ReplicaSet
  • Pod
  • StatefulSet
  • DaemonSet
  • Service
  • Job, CronJob
  • Volume, Persistent Volume, etc...
  • ConfigMaps
  • Secrets
  • Policies, Resource Quota
  • Selectors
  • YAML (Not that specific to K8s)
  • Kubectl
I didn't get a comprehensive mind-map showing all so planning to prepare one. It is advisable to upskill the team by hiring external people who already have experience with K8s. Converting an existing team that doesn't know K8s to experts is tough unless you are ok to accept failures.

No comments: