Tuesday, August 3, 2021

Getting started with Kubernetes dev environment using Docker Desktop

This is 3rd or 4th time, I am learning Kubernetes (hereafter mostly refer as K8s short form) hands-on sessions. Every time I learn the kubectl command and its options, I forget as there were no chances to apply in the day job. Another mistake I did all those times was missing to post learning to this blog.

Hope this time I will get a chance to use it in the day job and not miss posting the Kubernetes learning on to this blog.

This post is very basic. The aim is to get started with the Kubernetes development environment using Docker Desktop. Below are the steps at a high level to get started. Detailed steps with videos are available on the internet.

Install Docker with Kubernetes support

The first step is to download the Docker Desktop and install it. It's straightforward as installing any other software in Windows. It is free and requires virtualization to be turned on. 
In order to run the Linux containers, it is better to enable the WSL2 backend. There are detailed instructions available to get it done including how to enable WSL2 on Windows.

Kubernetes support is only available when we use the Linux container mode. Note that Docker Desktop supports both Windows and Linux modes.

Points to note

  • Better don't take the experimental build. Stay one version below the current stable version.
  • This installation will modify our hosts file in Windows. Better not change anything.
  • It is better to have an 8 core 16GB machine to work smoothly with container workloads.

Install Kubernetes dashboard

For beginners, it is easy to understand what is going on in the K8s cluster by looking at a UI dashboard application. K8s don't install a UI dashboard by default. That dashboard can be installed by using the below PowerShell commands.

Tuesday, July 27, 2021

Zero Trust security model

This post is about a design approach for loosely disconnected components of a system or a group of networked devices. Those components are normally different processes serving WebAPI, Database, or consuming applications. Or if wet think from networking aspect those are different devices inside a network. Mainly those are inside a perimeter network of an enterprise. 

More than programming, this is about enterprise architecture and no code snippets. Please continue if interested. 

What's before Zero Trust Security Model

Before Zero security model, there were models where only the external endpoints are protected. Whatever happens inside a trusted network area is considered secure. Some examples below

  • An AD server simply responds to the requests that come from a web server inside the trusted network.
  • If the webserver only exposing 443 for external traffic and uses 80 for internal services, it simply responds to requests in 80 without authenticating. This is on the assumption that the request can only originate from within that server.
This is applicable when the network is fully managed and has a clearly defined boundary

Zero Trust model

When systems became more cloud-friendly or hybrid, they started spanning across multiple networks. For example, it is not feasible to store big data within the enterprise network. Often they are offloaded to cloud services. That is the place where that data can be analyzed using modern big data tools such as Spark. 
Sometimes due to regulatory requirements data needs to be stored on-premise but compute can be in the cloud. They may need to run big Spark clusters on the cloud but needs to access the data from on-premise storage.

Often those networks are not controlled by the enterprise. The concept of inherent trust is no more applicable in this new world.

This is the reason for the Zero Trust model to get importance. This term is coined by John Kindervag in the year 2010 but there seem traces of this term from 1994 according to the internet. From my perspective more than who invented it, I focus on what is it.

The main pillars of the Zero Trust model are the centralized identity of users, devices, and applications, verification of those identities before serving, authorization based on lease privileged access, assume there can be a breach but make sure the firewalls, API gateways, and monitoring are in place, a segmented network so that the surface area is very less.

The Zero Trust model implementation guidelines are available for the major clouds such as Azure, AWS, and GCP. Also available for the Kubernetes clusters where people host microservices.

References

Case studies

This is a buzzword like what Microservice, Serverless was one time. Knowingly or unknowingly we will reach this model as the cloud is inevitable for enterprises. Please note this was there in Thoughtworks technology radar in 2020 but later removed.

Tuesday, July 20, 2021

Azure @ Enterprise - Why I recommend Kubernetes as app hosting platform?

I started writing this post 2-3 years back. Mainly when Apache Spark 2.3 started supporting Kubernetes (K8s) in 2018. It was obvious that Kubernetes is taking over app hosting space the same way virtual machines took over physical machines. All are expected to understand where the industry is moving and adopt. Hence I paused this post as there is nothing I need to endorse. But it's time to resume this post and publish it.

If you are already using container and K8s orchestration, feel free to skip this post and save time.

Basics

Containers

Please refer to my 2015 post about Software Containerization via Docker - First look and some thoughts for very low-level basics. The container can be understood as a micro virtual machine with the application(s) and runtime isolated from each other. It has a file system inside it, it can expose ports for networking. Mount external storage.

This mainly co-evolved with the microservice architecture where the huge systems are separately developed and deployed as small services that own their own data and release cadence.

Kubernetes

Officially 
"Kubernetes is a portable, extensible, open-source platform for managing containerized workloads and services, that facilitates both declarative configuration and automation." - Kubernets.io

In short, it is the container orchestration engine.

What is and why do we need container orchestration?

 In my own words, orchestration is  
  • Where to run
    • Horizontal scaling - Mechanism to use multiple nodes(VMs are called nodes) to run containers of the same system. Nodes are expected to fail at any time.
    • Deciding where to run the container - The mechanism to specify what node executes what container. Eg: a particular container in the solution may need to run on the Linux node another may need a Windows node. Some containers may need GPU to be present on the node
    • Can also include virtual nodes that even can use Azure Container Instances.
  • Deployment and rollback
    • Ability to rollback easily in case something goes wrong in deployment
    • It is expected to have failures in deployments but the orchestration system should be able to roll back easily to the previous known state.
  • Networking
    • Determine how the containers are allowed to communicate with each other and what ports are exposed to outside.
    • Also has load balancing capability
    • Also has routing capability.
  • Service discovery
    • The services/components in the system should be able to discover the endpoints to other services as the services may be deployed to multiple nodes.
  • Resource allocation
    • Determines how much system resources need to be given to each container.
  • Storage mounting
    • Ability to mount external storage
  • Scaling
    • The mechanism to specify how many container instances are to be running at a time. Eg: Based on Azure queue messages it has to scale out / horizontal scale.
  • Secret and configuration management
    • Secrets can even be stored in Azure KeyVault and places like that. But that may not work on-prem. An orchestration engine needs to have secret management.
    • Configuration management is another aspect of any software system. The containers need to be initialized with proper configuration wherever it runs.
  • CRON
    • Capability to execute a container based on CRON expression.
  • Health monitoring & self-healing
    • Monitor the state of the containers and make sure the state is true to the definition of the system. Eg: If a container crashes, the orchestrator makes sure it's started again.
Hope the above explains what and why orchestration?

Links

There are people out there who can explain orchestration better than me.

Why Kubernetes?

It is the currently available best container orchestration engine. It supports most of the capabilities mentioned in the above list of what and why orchestration. For further reading refer to K8s documentation.

What Kubernetes is not?

It is equally important to understand what Kubernetes is not for. They
  • Don’t have native CI/CD concepts.
  • Don't have a data storage engine. RDBMS Rollbacks are not supported out of the box. Better be backward compatible. 
  • Don’t have a built-in blue-green deployment strategy
  • Don’t have a built-in serverless function or event-driven execution
  • Don’t have a built-in message broker, service bus, etc… we have to use external or run a container for message broker.
More in their documentation.

How and where to host

K8s can be run on top of virtual or physical machines or in the cloud.

On-premise

It can be installed on-premise on top of virtual machines or physical machines. The machines are called nodes as well. Control nodes need to be Linux, worker nodes can be either Windows or Linux. If your organization is a hard-core windows shop, no luck here.

Cloud

Major cloud providers provide an easier way to host K8s applications.

Why Kubernetes is good for enterprises?

  • Vendor-neutral and portable
    • If cloud providers increase fees, easy to switch from one vendor to other
    • Worst case start and maintain on-prem K8s cluster
    • Never do own cluster if you are a start-up and hosting a single application.
  • No vendor lock-in due to native PaaS model
  • Environment parity
    • Less 'it works in my machine' comments from developers
  • High density
    • Can pack more things in resources. Less wastage.
  • Multi-tenancy
    • Multiple apps of the enterprise can share a single cluster via namespaces-based isolation.
  • Fewer operations cost
    • Due to high density and multi-tenant nature less operation cost to make the cluster up and running.
  • A clear view of architecture
    • Just reviewing the YAML files will give a clear idea of how the system is designed. No more surprises because the settings and configurations are spread across many places.
  • Repeatability
    • Since the container is immutable, it can be used years later if we want to see how the app did run earlier.
  • No need for private cloud software
    • An on-premise cloud can be created on the K8s cluster without any private cloud software such as OpenStack. The K8s cluster is the cloud where applications are separated by namespaces. Oh yes, the storage needs to be managed.

Learning curve

It will not be an easy journey to the Kubernetes. Engineering teams (Dev and DevOps if you have separate teams) need to be upskilled as there are a lot of new concepts in K8s. Some below
  • Namespaces
  • Nodes - Control and worker nodes
  • Controllers - ReplicaController, NodeController
  • Deployment
  • ReplicaSet
  • Pod
  • StatefulSet
  • DaemonSet
  • Service
  • Job, CronJob
  • Volume, Persistent Volume, etc...
  • ConfigMaps
  • Secrets
  • Policies, Resource Quota
  • Selectors
  • YAML (Not that specific to K8s)
  • Kubectl
I didn't get a comprehensive mind-map showing all so planning to prepare one. It is advisable to upskill the team by hiring external people who already have experience with K8s. Converting an existing team that doesn't know K8s to experts is tough unless you are ok to accept failures.