Tuesday, April 24, 2018

Azure @ Enterprise - Automation for scaling down HDInsight after busy hours

Introduction

Every software system involve many manual jobs to maintain it. From monitoring health, periodically maintaining indexes, updating operating systems, taking backups etc...Azure trying to automate those tasks as  much as possible using a service called Azure Automation. This post is about using Azure Automation for automatically scaling down HDI clusters in non peak hours.

Ideally speaking the cluster can be deleted after use. But the problem is that it takes a lot of time to create. In case a job is submitted after hours, it needs a lot of time. So better scaling down is an option. As always its not a silver bullet. Choose accordingly.

Azure Automation basics

There are so many tutorials available to get started using Azure Automation. Below are the high level information about Azure Automation.
  • Automation allows us to write Runbooks using PowerShell & Python workflow. Runbook is the real logic what to do.
  • Runbooks can even be graphical ie drag and drop coding.
  • Runbook can be exposed via Webhooks or scheduled to trigger execution. Each execution is called Job and we get the JobId to track progress.
  • Webhook clearly separate the credentials and connections from the code. So that, those secrets can be managed by the deployment guys.
  • It has the app store model to distribute Runbooks called Runbooks gallery. We can upload our Runbooks too.
Some links on how to do things in Automation

https://docs.microsoft.com/en-us/azure/automation/automation-first-runbook-textual
https://vincentlauzon.com/2015/11/01/azure-runbook-a-complete-simple-example/
https://docs.microsoft.com/en-us/azure/automation/automation-runbook-gallery
https://www.c-sharpcorner.com/article/create-runbooks-to-automate-tasks-in-azure-automation/

Gallery

Coming back to the context of this post. We have to automatically scale down the HDICluster in off peak hours. Don't there a runbook exist in the gallery to do the same? 

Yes there is one, but that doesn't seems to work if we have multiple Subscriptions. Hence this post. 

Contacted the author of the Runbook and informed the same. Hopefully there will be updated version soon. Below is the link.

Below is one link about Runbook gallery to get started.

Runbook for scaling down HDICluster

Directly writing the code as the approach is straight forward

    $SubscriptionName = "<Subscription Name>"
    $ResourceGroupName = "<Resource Group where the HDICluster resides not the automation account>" 
    $ClusterName = "<name of the cluster. No need of azurehdinsight.net>"
    [Int] $Nodes = 1
    
    $connectionName = "AzureRunAsConnection"
    Write-Verbose "Starting Scaling cluster $ClusterName to $Nodes nodes..."
    try
    {
        # Get the connection "AzureRunAsConnection "
        $servicePrincipalConnection=Get-AutomationConnection -Name $connectionName         

        Write-Output "Logging in to Azure..."
        Add-AzureRmAccount `
            -ServicePrincipal `
            -TenantId $servicePrincipalConnection.TenantId `
            -ApplicationId $servicePrincipalConnection.ApplicationId `
            -CertificateThumbprint $servicePrincipalConnection.CertificateThumbprint 
    }
    catch {
        if (!$servicePrincipalConnection)
        {
            $ErrorMessage = "Connection $connectionName not found."
            throw $ErrorMessage
        } else{
            Write-Error -Message $_.Exception
            throw $_.Exception
        }
    }
    Select-AzureRMSubscription -SubscriptionName $SubscriptionName
    Write-Output "Scaling cluster $ClusterName to $Nodes nodes..."
    Set-AzureRmHDInsightClusterSize `
        -ResourceGroupName $ResourceGroupName `
        -ClusterName $ClusterName `
        -TargetInstanceCount $Nodes

Please note the parameters are avoided to reduce code size. Ideally the values should comes as parameters.

Prerequisites

Azure AD App

"AzureRunAsConnection" is the name of connection at the automation account level which is going to be the identity of the Runbook code. 
The above code is tested with an Azure AD Application with certificate based authentication. The certificate associated with Azure AD App has to be uploaded to Automation account. Credentials are not at the Runbook level. Meaning multiple Runbooks in the Automation account can share same credentials.

Importing modules

The Runbook needs 2 additional modules which are not present already in Automation.
  • AzureRM.profile
  • AzureRM.HDInsight
Below goes one link how to add modules.


The interesting thing here is AzureRM.HDInsight depend on the AzureRM.profile. So first we need to add AzureRM.profile. Though the message says it is added, it is an async operation. So before the addition is not fully complete, if we try to add the AzureRM.HDInsight we get the same dependency missing error.

Scheduling the Runbook

Scheduling is simple as connecting the Runbook with a Schedule and associating the parameter values. 

Azure Automation @ Enterprise?

Enterprises can drastically reduce the cost to maintain systems if they start using Automation. Automation can be done even without Azure Automation. Powershell was also there earlier. But the advantage with Azure Automation is the blend of scalable infrastructure and automation language. The automation authors doesn't need to worry about where its going to run. Just write the code and give to Azure to execute.
Automation can even be used as part of multi-tenant application to isolate the security boundaries. One such mechanism is via Webhooks. High privilege tasks such as creating an HDICluster can be limited to an Azure AD App and Automation can run using that identity. The applications who knows the Webhook secret URL can only invoke and get the job done. Application doesn't need to know about the infrastructure details such as the virtual network name, sub nets etc.. All those can be local to the Automation Account.

One of my favorite slogan in software engineering is 'Solve our problems before we solve others'. The Azure Automation really helps in that context.

No comments: