Tuesday, October 6, 2015

Azure WebJobs - Internals and how tos

This is my attempt to index the features of Azure WebJobs in single place. Mainly for my reference.

What is Azure WebJobs

It is a feature of Windows Azure cloud more specifically WebApps to execute long running operations agnostic to platform / language. It can run executable files continuously, on demand or using a schedule. We can give .bat, .ps1, .exes and more as executable files to the Azure WebJob which it will execute as per the configuration. It uses an engine called kudu to achieve the functionality.

WebJobs does't know how to execute a job/function on events such as when a message arrives in Azure storage queue or a blob is uploaded. We need to write our own listening logic to these events in the executable script and execute corresponding actions on event.

If we are writing .Net exe as WebJob and want to listen to queue, blob etc..., the WebJobs SDK will help us to write it better. This is nothing but a collection of assemblies (.dll files) which our .Net project can refer.

What is Azure WebJobs SDK

.Net based framework which we can use to write triggerable functions. The SDK calls our functions on the trigger. It does many useful actions before and after our function execution. Some example of such actions are

  • Binds the values of queue or blob to the variables which our function has.
  • It mark the message as completed if our function didn't throw any exception. (Delete from the queue)
  • If the execution fails for 5(configurable) times it moves the message to poison queue.
The important thing to note here is This SDK is not coupled with Azure as it is just a listening mechanism and we can extend it to listen to our own custom events and invoke the handler functions.


http://azure.microsoft.com/blog/2014/10/25/announcing-the-1-0-0-rtm-of-microsoft-azure-webjobs-sdk/

How the .exe or script deployed in Azure machines

The WebJobs is a feature of Azure WebApps and it owns the job executable. Executable(s) are stored inside \\app_data\jobs\ folder separated by the type of the job and folder name as job name.

Using Azure WebJobs SDK outside Azure

Running outside of Azure and still listen to Azure

https://azure.microsoft.com/en-us/documentation/articles/websites-dotnet-webjobs-sdk/#workerrole

Listening to our custom events and invoking functions

There is support for invoking the functions manually using JobHost.Call() method. But there is no option for adding our own polling/listening logic. The polling needs to be done out side of SDK in our code and we can just call the function

Ideally if we are writing our own polling logic which is not polling the Azure properties, we don't need WebJobs SDK. But if we are migrating from another queue system to Azure queues, as part of step by step migration, we can first migrate the functions to SDK format and listen to old queue technology. Then in next phase we can start listen to Azure queue.

2 types of trigger

There is triggering mechanism on Azure WebJobs. Also there is different triggering mechanism on invoking a function written using WebJobs SDK.

Triggering Azure WebJobs

Even if we forget about WebJobs SDK this is still applicable as WebJobs is a feature of Azure WebApp. This feature is not tied with WebJobs SDK. It uses kudo engine to trigger Azure WebJobs. We can write Jobs in technologies other than .Net. I am really thankful to MSFT for opening up their eyes to see the whole world.

There are mainly 3 different type of invocation.
  • Continuous - Azure will make sure one instance of our WebJob exe or script is always running
  • OnDemand - Azure will run our exe whenever a request is received. Once its started it can run for long. Azure will kill it if its idle for configured time.
  • Run on Schdule - Azure will trigger execution of WebJob (.exe or supported script file) on the schedule. It uses Azure Scheduler to invoke the WebJob.

If we are writing .Net console exe application and use for WebJob task, the WebJob SDK comes into picture. We can even write our own logic in exe to monitor the queues and start corresponding function to process.

Triggering functions inside .Net exe written using WebJob SDK

If we use SDK, SDK provides options to just write functions which will be called by SDK on different triggers. SDK can listen to changes in Azure Storage Queue,Blobs etc... and based on the function registration, it will call our functions.

If the exe is not running, there is no way our function will get executed regardless of what ever SDK we use.

Azure WebJob SDK Magics

The WebJobs SDK does so many magic to make the life of developer easier so that they can concentrate on their business than the hosting and plumbing works.

Function based

Even if we don't know about class concept in OOP we can write web jobs as webjob SDK simply needs a function which will be executed on certain conditions.

Attribute based

We need to tell the SDK about us to do the magic and that is done via attributes. The queue, blob names to listen the object which is stored in the queue data all needs to be specified in attributes.

Serialization

The SDK serialize the queue data for us so that we simply gets the object via parameter when our function gets called.

Token replacement

We can use tokens inside the attributes and those will be automatically replaced by SDK. If we want to have a new blob for logging based on messageId property in our custom message object just use log\{messageId}. Then SDK will try to find if there is any messageId property in the message object and replace the token with actual value.

Storage Management

We just need to tell the blob name to the SDK via attributes. It will make sure the blob is there for us. 

Convention

It is really expecting 2 config entries. AzureWebJobsDashboard & AzureWebJobsStorage. These are used by SDK to connect to our Azure account.

Things to remember

Multiple function executions in WebJobs SDK

There are chances that WebJob SDK may call our functions multiple times for same queue item as part of retry and handling failures. So we need to make sure that even if the function execute twice it should not produce any damage.
Recommended approach would be to keep our own state for the queued item, update it when execution completed.Have a check in function beginning to make sure we are not redoing the operation.

Do not use static variables

When we register our functions with Azure WebJobs SDK, it invokes our functions in different threads. But its in same AppDomain. So its same as how ASP.Net handles the requests using different threads. If we use static variables there are chances that different threads will manipulate same value and we will end up in deep trouble.

Do not use .Net lock

As everybody knows the lock keyword works in process. It doesn't have any relevance in other process or other machine. In Azure it is easy to scale to multiple instances ie different machines. 

So short cuts such as lock, taking while migrating legacy code into WebJobs might not work well when scaled. The issues may pop up inconsistently so better avoid static variables and lock 

Don't write intermediate state to local files

This is the basics of any scalable background processing mechanism as there are chances that after failing the process may be picked by another machine. Then it will not resume from where it failed last time.

How tos - WebJobs SDK

https://azure.microsoft.com/en-us/documentation/articles/websites-dotnet-webjobs-sdk-storage-queues-how-to/
http://joymonscode.blogspot.com/2015/09/azure-webjobs-one-webjob-sdk-exe.html

No comments: