Tuesday, November 14, 2017

C# async and await with Thread static

This is continuation of below post about async and await. That post discuss about how the async and await evolved and the basics. The same sample context is used in this post as well. So better read below post before continuing.

http://joymonscode.blogspot.com/2015/06/c-async-and-await-programming-model.html

Async Await and calling thread behavior

There is no objection that it simplified the reasoning about the code. But it may cause trouble, if we implement without understanding how it works. Let us see one example below.

public void Main()
{
           Console.WriteLine($"Main() - Thread Id - {Thread.CurrentThread.ManagedThreadId}");
           for (int counter = 1; counter < 5; counter++)
           {
               if (counter % 3 == 0)
               {
                   WriteFactorialAsyncUsingAwait(counter)
               }
               else
               {
                   Console.WriteLine(counter);
               }
           }
}
private async Task WriteFactorialAsyncUsingAwait(int facno)
{
    Console.WriteLine($"WriteFactorialAsyncUsingAwait() - Thread Id - {Thread.CurrentThread.ManagedThreadId} - Begin");
    int result = await Task.Run(()=> FindFactorialWithSimulatedDelay(facno));
    Console.WriteLine($"WriteFactorialAsyncUsingAwait() - Thread Id - {Thread.CurrentThread.ManagedThreadId} - Factorial of {facno} is {result}");
}


Guess what would be the thread ids printed from WriteFactorialAsyncUsingAwait(). Will those be same?

Those who says same, be prepared to spend nights and weekend debugging. Especially if you have something ThreadStatic before and after await. Below goes the output.

Main() - Thread Id - 1
1
2
WriteFactorialAsyncUsingAwait() - Thread Id - 1 - Begin
4
WriteFactorialAsyncUsingAwait() - Thread Id - 3 - Factorial of 3 is 6

In the code the await is executed in separate thread similar to its Task<> equivalent

Thread.ContinueWith and calling thread behavior

 Lets see what is its Task<> based implementation.

public void Main()
{
    Console.WriteLine($"Main() - Thread Id - {Thread.CurrentThread.ManagedThreadId}");
    for (int counter = 1; counter < 5; counter++)
    {
        if (counter % 3 == 0)
        {
            WriteFactorialAsyncUsingTask(counter);
        }
        else
        {
            Console.WriteLine(counter);
        }
    }
    Console.ReadLine();
}
private void WriteFactorialAsyncUsingTask(int no)
{
    Console.WriteLine($"WriteFactorialAsyncUsingTask() - Thread Id - {Thread.CurrentThread.ManagedThreadId} - Begin");
    Task<int> task=Task.Run<int>(() =>
    {
        int result = FindFactorialWithSimulatedDelay(no);
        return result;
    });
    task.ContinueWith(new Action<Task<int>>((input) =>
    {
        Console.WriteLine($"WriteFactorialAsyncUsingTask() - Thread Id - {Thread.CurrentThread.ManagedThreadId} - Factorial of {no} is {input.Result}");
    }));
}

See the output it is working the same way as of async await model.

Main() - Thread Id - 1
1
2
WriteFactorialAsyncUsingTask() - Thread Id - 1 - Begin
4
WriteFactorialAsyncUsingTask() - Thread Id - 4 - Factorial of 3 is 6

Why the consuming code is written inside ContinueWith({}) callback instead of reading the result from Task.Result property? If it is not via ContinueWith({}), the execution wait on task.Result line, hence the outer loop cannot move to next item. We loose all the benefits of Task then. Below goes the code for task.Result access and see how it blocks the outer loop from executing in parallel.



public void Main()
{
    Console.WriteLine($"Main() - Thread Id - {Thread.CurrentThread.ManagedThreadId}");
    for (int counter = 1; counter < 5; counter++)
    {
        if (counter % 3 == 0)
        {
            WriteFactorialAsyncUsingTask(counter);
        }
        else
        {
            Console.WriteLine(counter);
        }
    }
    Console.ReadLine();
}
private void WriteFactorialAsyncUsingTask(int no)
{
    Console.WriteLine($"WriteFactorialAsyncUsingTask() - Thread Id - {Thread.CurrentThread.ManagedThreadId} - Begin");
    Task<int> task=Task.Run<int>(() =>
    {
        int result = FindFactorialWithSimulatedDelay(no);
        return result;
    });
    Console.WriteLine($"WriteFactorialAsyncUsingTask() - Thread Id - {Thread.CurrentThread.ManagedThreadId} - End - Task Result - {task.Result}");
}

The output below shows clearly that the 4 is processed from the loop only after the 3 is processed. We lost the parallelism. The thread ids are same before and after.

Main() - Thread Id - 1
1
2
WriteFactorialAsyncUsingTask() - Thread Id - 1 - Begin
WriteFactorialAsyncUsingTask() - Thread Id - 1 - End - Task Result - 6
4

Moral of the story

Though async await seems easy to use, usage without understanding will take away our sleep and weekends.

Tuesday, November 7, 2017

Who is responsible for making APIs / apps secure?

If we are from software engineering background we immediately says its developers. Some enterprise people will go ahead and say that it is combined duty of infrastructure and development. Infra has to setup proper VPN access, firewalls etc...If we scope to public internet applications, we can pretty much hear one word its the duty of developers developers developers. May be some will says its architecture too. But mostly if its not enterprise level, application architecture is part of development.

There are some problems with leaving the duty of security to developers

  • Developers always focus or has to focus on application features. 
  • Developers are not experts in security field. They may not to not supposed to be up to date with all the security vulnerabilities found out in the world.

There could be more problems we could think of. So what is the solution in the unsecured world of IT?

Once simple answer is to let developers free from security aspect and give it to security experts. Hiring one security expert and he looking at every line of code produced is not a great idea either. So what to do? Is buy or renting a security product / service a viable option? Seems its viable than betting on developers securing the applications. Those applications or services often referred to as API Management Gateways

What are these API Management Gateways do

Suppose a developer leaves a SQL injection hole and it missed in testing stages and reached production, these gateways are expected to block SQL injection attack by inspecting the payload/traffic. Similarly other attacks also are supposed to be handled by the gateway before reaching to the application servers.

Below are some links of basics and list of players in the application or API security management market.

http://www.forumsys.com/product-solutions/api-security-management/
https://www.roguewave.com/products-services/akana/solutions/api-security
http://www.apiacademy.co/resources/api-management-lesson-201-api-security/

Most of the players are cloud ready. Even cloud providers such as Microsoft Azure have their own offerings to secure applications in Cloud.

Will these gateways reduce performance?

Nothing comes in free. If someone is claiming its adding 0 delay they are wrong. Somewhere instructions are supposed to execute which validate the traffic and take decision. They can make it faster enough that it is not visible to the outside. In order to speed up they often use dedicated appliances or hardware instead of commodity servers.

Comparison

Some sites where different products are compared. It might not be accurate to the date. But a good starting point.

https://www.itcentralstation.com/categories/api-management#top_rated
http://transform.ca.com/API-Management-Platform-Vendor-Comparison.html

Is this the silver bullet

There is no silver bullet in software engineering or in science. Whatever is best at the time and situation adopt it. Embrace change when new better ways are available. 

If the application is highly sensitive and bet on one gateway to protect, it will not be a great solution because that gateway might not be update with our security requirements. For example if a zero day attack is found and gateway is not updating with in days and releasing new versions but our business need in the next hour, probably we should let our developers take care of security. May be we will soon end up in building another gateway but its worth doing it.

Tuesday, October 31, 2017

Exposing Parquet file to SQL 2016 as well as Hadoop (Java/Scala)

This is just an architecture post explaining the possibility of Parquet file exposed to SQL 2016 databae via polybase and other applications accessing normally. The other applications can be anything such as data analytics code running in Hadoop cluster.

Mainly this kind of integration needed when we already have an transaction database such as SQL Server and we have to analyze data. Either we can have scheduled data movement using ETL technologies or we can use polybase to move data from an internal normal table to external polybase table which is backed by parquet file. If the solution is in Azure, the parquet file can be somewere in storage. Once the data is there in parquet file format the analytics algorithms can hit the same. Parquet file is mentioned here because of their familiarity in analytics community.

Below goes such an architecture.
Since the architecture may change over the time, LucidChart diagram is embedded. Please comment if this is not working. Thanks to LucidChart for their freemium model.

Details on implementation such as code snippets are good to share in separate post.

Tuesday, October 24, 2017

Caution JavaScript Ahead - setTimeout() or setInterval() doen't make JavaScript multi threaded

If someone born into JavaScript, they know what is meant by JavaScript is single threaded. Developers coming from other languages also knows that JavaScript is single threaded. But some APIs in JavaScript force them to think that JavaScript is multi threaded. Below is such a situation

setTimeout(() => {
  console.log('Timeout hit and time is ' + new Date());
}, 1000);
console.log('setTimeout at ' + new Date());

There could be N number of reasons for someone wants to execute his code after specified time. It might be initialization delay, rendering time, session timeout handling etc... But is this going to solve the problem by executing the code exactly after 1 second (1000ms)?

If we consider any application there will be more code than these 2 lines. Consider someone else written below code which got executed after we register the handler via setTimeout()

setTimeout(() => {
  console.log('Timeout hit and time is ' + new Date());
}, 1000);
console.log('setTimeout at ' + new Date());
problemMaker()

function problemMaker() {
  //Any sync AJAX call or some code which execute for long time.
  var url = 'https://httpbin.org/delay/5';
  var request = new XMLHttpRequest();
  request.open('GET', url, false);
  request.send();
  document.writeln(request.responseText.length); 
}

Does this ensure that the function gets executed after 1 second? Native JavaScript developers can immediately identify the issue. Other may think it might work. Lets see a test run result in console.

setTimeout at Tue Oct 24 2017 19:46:14 GMT-0400 (Eastern Daylight Time)
Timeout hit and time is Tue Oct 24 2017 19:46:29 GMT-0400 (Eastern Daylight Time)

Yes JavaScript is single threaded whatever we do with setTimeout or setInterval functions. Better do no trust them on when they are going to execute. If we write like this it may work on development machine and may fail in higher environments such as business testing, staging or production. Highly inconsistent issue. Lets avoid saying "It works in my machine".

Sample code located at https://plnkr.co/edit/uexi2U

Tuesday, October 17, 2017

Running multiple instances of AzCopy.exe command

AzCopy.exe is really an amazing tool for data transfer. But if we run multiple instances of AzCopy we may get below error.

AzCopy Command - AzCopy /Source:c:\temp\source /Dest:https://<storage account>.blob.core.windows.net/test /DestSAS:"<SAS>" /pattern:"" /s

An error occurred while reading the restart journal from "C:\Users\<user name>\AppData\Local\Microsoft\Azure\AzCopy". Detailed error: The process cannot access the file 'C:\Users\<user name>\AppData\Local\Microsoft\Azure\AzCopy\AzCopyCheckpoint.jnl' because it is being used by another process.

The error is pretty much clear. AzCopy keeps a journal file for resume functionality and if we don't specify the journal file location in command it uses default location and when second AzCopy starts it cannot read journal file.

The fix is to specify the location for .jnl. AzCopy Command goes as follows
AzCopy /Source:c:\temp\source /Dest:https://<storage account>.blob.core.windows.net/test /DestSAS:"<SAS>" /pattern:"" /s /z:<unique folder for azcopy command>

If we are running AzCopy from the command window it is easy to find out. But if AzCopy is invoked from applications (PowerShell or .Net) in parallel it is difficult to find out because we might have disabled all the messages using /y. AzCopy has /v: switch which redirect the logs to a file. That will help to troubleshoot.

Tuesday, October 3, 2017

Using .Net default value is trouble especially for APIs

Though there are features, they are not meant to be used. JavaScript atleast has a book named "JavaScript: The good parts" but others don't have one. Lets see one scenario in C# .Net.

Long long ago there was an API exposed to clients.

        class MyAPI
        {
            public void APIMethod(string s, int i = 10)
            {
                Console.WriteLine($"Inside foo with i = {i}");
            }
        }
internal void Test()
        {
            MyAPI apiClient = new MyAPI();
            apiClient.APIMethod("hi");
        }

Clients were happy using it. Later someone got added to the API team and he overloaded the method as follows thinking that keeping the same name will help the clients discover the API.

        class MyAPI
        {
            public void APIMethod(string s, int i = 10)
            {
                Console.WriteLine($"Inside APIMethod with i = {i}");
            }
            public void APIMethod(string s)
            {
                Console.WriteLine($"Inside APIMethod");
            }
        }

Clients happily adopted the new version of API. But soon they started feeling their calls are not working as expected. They escalated the issue and developers spent hours and days and finally they figured what went wrong. They corrected the code as follows.

        class MyAPI
        {
            public void APIMethod(string s, int i = 10)
            {
                Console.WriteLine($"Inside APIMethod with i = {i}");
            }
            public void NewAPIMethod(string s)
            {
                Console.WriteLine($"Inside NewAPIMethod");
            }
        }

Moral of the story

Do not use a feature only because it is available.

Tuesday, September 26, 2017

What is wasb(s) protocol for accessing Azure blob storage

WASB Protocol

I didn't get a chance to study wasb(s) protocol before I had to use it for HDInsight related tasks. So it was trial and error in the initial time. That lead me to take a decision to write detailed post on the wasb protocol since it helps us especially when we are Hadoop in Azure world.

But before writing, thought of googling for similar work. Why should there be one more post in the internet with same content? The results were interesting. Below are the links explaining what is wasbs protocol for accessing Azure storage blob.

https://blogs.msdn.microsoft.com/cindygross/2015/02/03/why-wasb-makes-hadoop-on-azure-so-very-cool/
https://blogs.msdn.microsoft.com/cindygross/2015/02/04/understanding-wasb-and-hadoop-storage-in-azure/

But these were not answering all the questions I had. So decided to add those here.

Can .Net access the wasbs:// url

.Net really don't need to access Azure blob storage via wasbs protocol. It can access via SDK using corresponding object model. Or it can access using https protocol.

References