Tuesday, July 26, 2016

What is side effect of a function in programming?

Introduction

This articles explains what is side effect of function and advantages of avoiding side effects

Background

We all work with object oriented programming for years and our normal thinking goes to desinging application as similar to real world as much as possible. We use object oriented design principles to achieve the same. Shoud we really make everything object with its own class level properties? We are trying to investigate is there any better way to write code which is maintainable

What is side effect of a function

When we write a function / method in the first place, what is our intention? It does a job. It may or may not accept parameters. Sometimes it returns a value. That is the effect of the function invocation.
But there might be situations, that a function may start doing things which are not controlled by that  function. Look at the below code. 

//
// ES6 code which has a function with side effect.
//
class Calculator {
  constructor() {
    this.op = "add";
  }
  calculate(n1, n2) {
    if (this.op === "add") {
      this.op = "sub";
      return n1 + n2;
    } else {
      this.op = "add";
      return n1 - n2;
    }
  }
}


Here what is the effect? As consumers we call the calculate() the effect we are expecting is the calculated value. But what is happening here?

We get the calculated value. But in the mean time its doing something else on class level variable which we as consumers, don't expect.

It makes our next call gives different output for the same input values. Those are considered as side effect of the function. Side effect will be seen, if we look from the consumer's view.

Below is the code to invoke the function 3 times with same input.

var calc = new Calculator();
alert(calc.calculate(20, 12));
alert(calc.calculate(20, 12));
alert(calc.calculate(20, 12));

Drawbacks on functions with side effect

The functions with side effects make it difficult to reason about code. What is reasoning about code? If we simulate the working of code, it is little difficult as we need to remember the state too along with the steps. This makes it difficult to achieve development speed and often cause bugs as it is difficult to think about all the possibilities.

This again brings constraint on calling order especially when there are multiple threads using same object to call the methods. When those kind of scenarios occur, it would be difficult to reproduce the bug at the developer end. Normally when developers encounter these types of thread issues, they simply put a lock around critical section. That simply brings us the effect of old single core processors. Our program will not be able to take advantage of multi cores if we just put locks in our code as it make the threads wait for each other. This may lead to deadlocks as well.

What else brings side effect to a function

Not only changing the global state is considered side effect. Whatever we are doing to the state of outside world is considered side effects. Consider the modified code.

// ES6 code which has a function with side effect. It doen't alter class level variable.
class MyCalculator {
  calculate(n1, n2, op) {
    var result;
    if (op === "add") {
      result = n1 + n2;
    } else {
      result = n1 - n2;
    }
    console.log("success");
    return result;
  }
}

This seems like side effect free. But if we look at the logging code, we can see its changing the outer world. So has side effect. If somebody has run the below code before invoking above method, only alternate calls will succeed.

// Redirect the console.log
var shouldSuccess = true;
console.realLog=console.log;
console.log = function(msg) {
  if (shouldSuccess) {
    shouldSuccess = !shouldSuccess;
    console.realLog(msg);
  } else {
    shouldSuccess = !shouldSuccess;
    throw "I am tired!!"
  }
};

In the real scenarios, there might be situations similar to this and that will cause the function fail inconsistently.

Can we write everything without side effects

The world is stateful. Its not possible to write every method in production code to be side effect free. But we should try to minimize the side effect as much as possible. Especially when we are writing sequential algorithms, transformations etc...Also we can we immutable objects to avoid risks.

Points of Interest

There are concepts like pure functions, idem potent functions which are related to side effects. The function programming concept too trying to solve the side effects to an extent.

Tuesday, July 19, 2016

Communication between WPF Windows App & HTML Page hosted in WebBrowser control

Context

This can be considered as a continuation of one old post in this same blog. In that post we can see the technique to establish communication between HTML Page inside WebBrowser controls and and .Net application. In this post we are going to see the approach in detail. Code is available for download and is self explanatory.

Technical details


  • For WPF to JS communication
    • WebBrowser control has a method called InvokeScript.
    • This can be used to invoke any JavaScript function present in the rendered web page and pass parameters.
  • For Web page/JS to WPF .Net communication
    • WebBrowser has a property called ObjectForScripting which will be available in JS as window.external
    • If we set an object to ObjectForScripting property, that .Net object will be available in JS and from JS we can call any method on that object which will in turn call the .Net code.

What’s not done

This just proves the technique. Performance testing is not part of this.

Source code

  • The HTML page is inside the WPF.
  • WebBrowser loads the HTML page by using pack: uri syntax.
  • WebBrowser's ObjectForScripting Object is set when the window loads
  • A class which is comvisible is used for the above property
  • From JavaScript the SetAValue() is called to pass data from JS to WPF
  • For WPF to JS myWebOC.InvokeScript() is invoked
  • In JS the data is received in the setSomeText() function.
Download from the below url.

Tuesday, July 12, 2016

Using GitHub API to show our projects in personal sites or blog

Introduction

There are still debates going on regarding whether Github is a developer's resume or not. Whatever happens it is good to show the list of GitHub projects in a software engineer's personal site to whom he/she is contributing to.

Integrating Github API to www.JoymonOnline.in

Below is the diagram shows how I have integrated GitHub into my personal site.

www.JoymonOnline.in is a web site

Most of us know the difference between web site and web app. Here in this case we are dealing with a web site where the HTML is generated at server side and send to client. Client browser is supposed to just render the markup given by web server.

How the GitHub API looks like

Its just a URL. More specifically ReST urls which points to resources. Below is one example API url. This gives details about the repository joyful-visualstudio. Just copy paste the url into browser address bar and hit enter. The data will be displayed.

How to call GitHub API

Coming to technical programming aspects. Here in this site the API is called using WebClient class. This is a simple class in .Net framework which can be used to deal with web sites and web APIs.

Authentication / Security

If the request is not authenticated, we can perform read operations. But there is a limit in such calls. That limit is based on the IP address from where the call originates. In this case the call originates from ASP.Net code of http://joymononline.in web site. The site is hosted in shared hosting plan. It means there will be other sites in same server machine. They may call GitHub without authentication and from GitHub point all originate from same IP and it limits.

To get rid of that problem we can authenticate the requests with token. The simple way is to obtain the token from GitHub and keep in the site and transfer via header. The header name used is 'Authorization'. Now the limit is per token, its high limit and no need to worry about shared hosting.

Securing API token

Make sure the token is not public. In JoymonOnline where source is hosted in public repository in Github itself, token is not visible. Its inserted during the deployment time from AppVeyor CI solution.

.Net Client SDK

GitHub also has client sdk to make the interaction easier. If we are using client SDK we don't need to worry about low level http related WebClient class.

Tuesday, July 5, 2016

Published .Net Orchestration library to nuget via AppVeyor

Introduction

There is no need to explain about the importance of nuget package ecosystem to .Net development community. Now a days its very difficult to consider someone as .Net if he don't have a nuget package. Of course now I can tell because I just published my first nuget package :)

Its the same Orchestration library I have hosted some months back.

Standard way

How to package our .Net library(.dll or scripts) as nuget package and publish to www.nuget.org is documented very clearly in nuget web site. Link below.

http://docs.nuget.org/create/creating-and-publishing-a-package

So I followed the same guidelines and published it. All manual and using Visual Studio and command line tool nuget.exe. Its available in below location.

https://www.nuget.org/packages/Orchestration

CI & CD way

Manually publishing every time from our development machine is really boring task. Its highly repetitive. Also how to ensure that the the library actually works after each change? If we cannot solve our this problem, how can we solve other's problem.

As everybody knows, the answer to this problem is continuous integration. If that is hosted its too easy. If its free too, don't ask any more questions, just do it.

As we have seen in some of previous posts, AppVeyor is the best place for doing free hosted CI & CD for .Net projects at this time. I will highly recommend than Microsoft's Visual Studio Online service which gives some free build minutes. Infact Microsoft itself using AppVeyor to do their integration.

Below are the official AppVeyor documentation about how to setup CI for nuget projects.

http://www.appveyor.com/docs/nuget#automatic-publishing-of-nuget-projects
http://blog.appveyor.com/blog/2014/02/21/nuget-support-in-appveyor-ci

Compilation of the project and running unit tests are trivial. There won't be usually any issues at all. But we want to automatically create the nuget package and increment it's version, we may need to think twice.

There are basically 2 options. The first option is kind of obsolete now.
  1. Explicitly run our own script to auto increment the nupkg file version and fire the nuget pack command to create the .nupkg file
    1. http://www.codeproject.com/Tips/806257/Automating-NuGet-Package-Creation-using-AppVeyor
    2. https://code.msdn.microsoft.com/Create-and-Push-NuGet-c6072402
  2. Use AppVeyor's out of the box settings to auto increment the version.
    1. Enable Assembly patching in AppVeyor
    2. Use $version$ token in version property inside .nuspec file
    3. Enable the nuget build in AppVeyor to automatically package the file.
We can take the second approach which is out of the box which don't.require any coding. Remember less code less defects!!!. Once this is working file we can see the .nupkg file in the build artifacts. From there we can push via AppVeyor itself. We need to setup our nuget API key either in appveyor.yml (encrypted) or in the web interface.

I was in little trouble in the beginning to do things without custom scripts and contacted their support. They helped immediately. I really appreciate it.
http://help.appveyor.com/discussions/problems/525-nupkg-getting-created-at-same-version-every-time

Tuesday, June 28, 2016

Code quality in legacy projects - Different approach

Dealing with legacy project / code base is a tricky thing. This issue is there in other engineering disciplines such as civil too. Think about a civil engineer, who want to do something to an old house. It may be taking an extension or building upstairs. Should he spent time on fixing the issues in the existing house or concentrate on the new construction? If he spend more time on fixing existing house, such as strengthening, that work may not be visible to the stakeholder ie the house owner. But that is required to carry out the extension/new tasks. Convincing house owner on how much to spend and where to spend is a difficult job for the engineer.

Lets come to the software engineering. Engineer is asked to work on a project to add features which has legacy code. After the first sight of code, we could see its not following any basic principles of coding. Classes with 1000s of lines and all methods written as public, everything coupled each other etc...How do we convince the stakeholders, it require refactoring before we start the new features?

Sometime in new projects itself, we feel that developer are spending more time to fix issues and all due to bad code and it requires refactoring. But how can we identify what areas require refactoring. The ideal case is to refactor the entire application, but most of the time, if the code base is large, its not practical and cost effective.

Lets go back to civil engineering. If civil engineer has some proofs such as cracks on walls, leaks or thin walls he can present the same to the owner and get the approval for strengthening. But how can a software engineer find out such proof to do refactoring?

The answer is he need to find out the hot-spots where the code is hot.

Cold code

Lets understand what is cold code. This is the code which is dead/unused/less changed or just works! We can see in any legacy systems more than 50-60% code is not changing though those are written badly. Sometimes that code is not getting called or even its getting called, it does it duty. Bad code doesn't mean that its not doing it duty. Bad code here refers to wrongly structured code which is difficult to understand and maintain. Maintain means adding features and fixing defects. If the code is working as per the expectation, should we change it just to keep up with the coding standards and to make the code beautiful?

Can we write perfect code

Programming is more to art than engineering. In art there is no perfection. Our today's perfection may not be tomorrow's. It also changes from person to person. So whatever we do to our code to beautify today, after some time we feel that its legacy and needs refactoring. One another reason for that is, in programming there are many ways to achieve same thing. One person think one way is great, the next person thinks great way is something different.

Unless we have unlimited funding, its not practical to refactor a large code base entirely. So we need to find out the hot spots which require refactoring to add value.

Hot code

What is hot code? We can say that its the code which needs to be refactored first. Below are some examples. We require the help from source control systems to find hot spots where hot code is located.

Code that continuously changing

If a portion of code getting continuously changing, we can say its the hot code. Remember the open closed principle. This continuously modified code is not following that principle and refactoring that area will help us in long term.

Code that changed my different teams

If a portion of code is getting changed by multiple functional teams, we can smell one more area of hot code. Ideally, if that portion is following Single Responsibility Principle, only one functional team will be touching that code. Another problem, if multiple teams are touching same code is lack of responsibility. If something breaks in that area, nobody will take responsibility as many people are working on it. Refactoring that portion will help to avoid many things.

The best example will be a serializer / other generic utilities getting changed by multiple teams. If a team doing 'User Registration' feature is changing the serializer, we can easily identify that serializer is not generic and needs refactoring.

As we have seen, these method work by analyzing the source control systems.

Bug prediction

Even if we present the above proofs which demands refactoring, stakeholders may not agree for refactoring. So we need to explain other advantages of finding hot-spots. One is bug prediction. 

The concept is simple. If some code is changing and the reason for change is defect fix, we can predict that new changes in the area will introduce more defects unless we refactor. If we refactor that area, we can reduce the cost for defect fixing. Stakeholders always looks at money and quality to the end user. If the defects are getting reduced which improve the quality to end user, they will agree to refactor.

Below are some links related to how google used this source control based technique.
http://google-engtools.blogspot.com/2011/12/bug-prediction-at-google.html
http://www.google.com/patents/US20140033174

After implementing, there were studies on whether this really helps humans and could see that its not able to tell developer, what is the bug exactly. It just tells there are chances for bug in the file as its changed frequently.

Below are some implementations of the google bug prediction algorithm.
https://github.com/igrigorik/bugspots
https://github.com/niedbalski/python-bugspots

Tools

There are other tools as well for analyzing source control systems and extract information.
http://people.engr.ncsu.edu/ermurph3/papers/icsm11.pdf
Code maat

Moral of the story

Next time do not just say, we need to refactor the entire code base and stay without any answer when stakeholders ask why we need to refactor and what is the estimate . 
Instead present proof that these are the hot-spots we need to refactor and we need these much time.

Tuesday, June 21, 2016

Orchestration library for .Net 4.0

Introduction

It was started with queued processing in one of the projects. We have a queue framework mechanism for long running processes. If anyone needs to use that queue, they should host a web service which will be called from queue framework after de-queue. The queue type and service url needs to follow some convention to get the connection.

Problem

If it is a technology shop where developers follow principles such as SOLID this problem will never occur. Here in the delivery shop, the problem was in the service method which does the long running operation people started writing 2000+ lines.

May be the developers were securing their jobs. Soon the code base became monopoly to corresponding devs. No other developers were able to touch it.

It went without noticed for couple of years. It became problem soon after people left the team and started getting failures on really long running (more than 1 hour) queue processes. It was supposed to resume but none of those resumed from the failure point on requeue. It became nightmare to makre it resumable. 

Why you developed it in sequential way? The developers answered this is the way we know. We need to do sequence of operation of the de-queued message. Output of one operation may affect the subsequent operation sometimes they are independent. If required we can separate to functions.

SRP

It is clear that developers were not following the Single Responsibility Principle. Separating steps to methods doesn't help here because some long running operations share steps. So its the time to enforce SRP.

Another problem is the way queue framework relies the control to actual long running program. It was a simple service call based on convention. It helped the developers to write a service and write the logic inside the implementation method.

The solution is to accept step objects by the queue framework and call those one by one instead of calling a service method. So after each step Orchestrator can store the state so that it can restart from the last step. The steps can be unit tested easily.

The Orchestrator born here.

Mutability - Pipelined operations on same object context

Another concern was to manage the state. These operations were sharing same state and some time one required output of previous. Not necessarily the immediate previous step.

Everywhere its evil to have mutability and people strive for immutable systems. But here it was little difficult to achieve the same. So continued with mutability.

But how to control the flow better?

Stateless steps

Why should a step be ever stateful if Execute(context) method gets the context via parameter? If the  Orchestration engine can pass the same context object to every step and step can modify the state and there is no parallel step execution we are good. Isn't it?

Yes we are good without parallelism. 

Concurrency & Parallel step execution

Parallel processing is always difficult to get right. One first factor is to decide who defines the what to run in parallel. Whether its the Orchestrator framework or the step authors? Its a debatable topic and most probably we will end up with step developers to tell what to run in parallel. What will happen if developers, who don't know SRP defines the unit of parallelism?

Most of the time stability is better than speed. So no concurrency as of now.

Source

Being an architect I can suggest the solution. But cannot enforce it as long as I don't have the rights to hold a release. But there is no problem in developing such a framework and open sourcing the same.

https://github.com/joymon/Orchestration

Note: This is for .Net 4.0 users. If the target is .Net 4.5, there is better library called Banzai.

Tuesday, June 14, 2016

TypeScripting AngularJS 1.x - Routing

Though we say we are developing web apps, navigation is really important in it. It is always advisable to have separate urls for different screens though the entire page won't reload when we change the screens.
In AngularJS there are mainly 2 ways to achieve routing. One with the routing library coming with framework and other using ng-route and other is via ui-router. Comparison between those is a big thing and not coming  under this post. Here we are going to see how routing can be setup in the TypeScript way using the library inside the Angular framework.

TypeScript has definitions for ng.Route library. If we add that we can get the routing related types such as ng.route.IRouteProvider. But as always we can setup the route in JavaScript style inside TypeScript file. Since every valid JavaScript is valid TypeScript, there won't be any compilation issue. But the problem is maintainability it its a big app. If we  change a route or pass wrong argument, there won't be any warning / error in JavaScript way. But if its written in TypeScript way, it won't compile on wrong parameter type or names. If we are using Visual Studio, TypeScript way can show the  suggestions on what are the options available, when we configure routing.

Lets see how its done in code.

JavaScript way

angular.module('HRApp').config(['$routeProvider',
    function routes($routeProvider: ng.IRouteProvider) {
        $routeProvider
            .when('/add', {
                template: '<add-employee></add-employee>',
                caseInsensitiveMatch: true
            })
.           .when("/Company/:companyId/Employee/:empId", {
                template: '<employee-detail></employee-detail>'
            })
            .otherwise({
                template: '<employee-detail></employee-detail>'
            });     } ]);
Here how do we know, what are the properties available for the object being passed to .when() method? Whatever we pass, it never complain. But it never give us the required behavior. We may need to run the app 2-3 times to make sure the spelling and all are correct. Lets see in TyprScript.

TypeScript way

module HRApplication {
    "use strict";
    export class HRModule {
        app: ng.IModule;
        constructor() {
            this.app = angular.module("HR", ["ngRoute"]);
        }
        setupServices(): void {
        }
        setupDirectives(): void {
        }
        setupRoutes(): void {
            this.app.config(["$routeProvider",
                function ($routeProvider: ng.route.IRouteProvider) {
                    $routeProvider.when("/add", new AddEmployeeRoute())
                        .when("/Company/:companyId/Employee/:empId", new EmployeeDetailRoute())
                        .when("/Company/:companyId", new EmployeeDetailRoute())
                        .otherwise(new EmployeeDetailRoute());
                }]);
        }
    }
    var hrApp: HRModule = new HRModule();
    hrApp.setupServices();
    hrApp.setupDirectives();
    hrApp.setupRoutes();
}

What about the AddEmployeeRoute() and other route classes. Those should implement the ng.route.IRoute. Below is the code
class AddEmployeeRoute implements ng.route.IRoute {
    template = "<add-employee></add-employee>";
    caseInsensitiveMatch = true;
}
class EmployeeDetailRoute implements ng.route.IRoute {
    template = "<employee-detail></employee-detail>";
}


The advantage here is, if we inherit from IRoute we will get intellisense for the available properties such as caseInsensitiveMatch. If we are writing in JavaScript way, we never get a chance to ensure its correctly spelled unless we run the application.

Happy TypeScripting...