Tuesday, June 28, 2016

Code quality in legacy projects - Different approach

Dealing with legacy project / code base is a tricky thing. This issue is there in other engineering disciplines such as civil too. Think about a civil engineer, who want to do something to an old house. It may be taking an extension or building upstairs. Should he spent time on fixing the issues in the existing house or concentrate on the new construction? If he spend more time on fixing existing house, such as strengthening, that work may not be visible to the stakeholder ie the house owner. But that is required to carry out the extension/new tasks. Convincing house owner on how much to spend and where to spend is a difficult job for the engineer.

Lets come to the software engineering. Engineer is asked to work on a project to add features which has legacy code. After the first sight of code, we could see its not following any basic principles of coding. Classes with 1000s of lines and all methods written as public, everything coupled each other etc...How do we convince the stakeholders, it require refactoring before we start the new features?

Sometime in new projects itself, we feel that developer are spending more time to fix issues and all due to bad code and it requires refactoring. But how can we identify what areas require refactoring. The ideal case is to refactor the entire application, but most of the time, if the code base is large, its not practical and cost effective.

Lets go back to civil engineering. If civil engineer has some proofs such as cracks on walls, leaks or thin walls he can present the same to the owner and get the approval for strengthening. But how can a software engineer find out such proof to do refactoring?

The answer is he need to find out the hot-spots where the code is hot.

Cold code

Lets understand what is cold code. This is the code which is dead/unused/less changed or just works! We can see in any legacy systems more than 50-60% code is not changing though those are written badly. Sometimes that code is not getting called or even its getting called, it does it duty. Bad code doesn't mean that its not doing it duty. Bad code here refers to wrongly structured code which is difficult to understand and maintain. Maintain means adding features and fixing defects. If the code is working as per the expectation, should we change it just to keep up with the coding standards and to make the code beautiful?

Can we write perfect code

Programming is more to art than engineering. In art there is no perfection. Our today's perfection may not be tomorrow's. It also changes from person to person. So whatever we do to our code to beautify today, after some time we feel that its legacy and needs refactoring. One another reason for that is, in programming there are many ways to achieve same thing. One person think one way is great, the next person thinks great way is something different.

Unless we have unlimited funding, its not practical to refactor a large code base entirely. So we need to find out the hot spots which require refactoring to add value.

Hot code

What is hot code? We can say that its the code which needs to be refactored first. Below are some examples. We require the help from source control systems to find hot spots where hot code is located.

Code that continuously changing

If a portion of code getting continuously changing, we can say its the hot code. Remember the open closed principle. This continuously modified code is not following that principle and refactoring that area will help us in long term.

Code that changed my different teams

If a portion of code is getting changed by multiple functional teams, we can smell one more area of hot code. Ideally, if that portion is following Single Responsibility Principle, only one functional team will be touching that code. Another problem, if multiple teams are touching same code is lack of responsibility. If something breaks in that area, nobody will take responsibility as many people are working on it. Refactoring that portion will help to avoid many things.

The best example will be a serializer / other generic utilities getting changed by multiple teams. If a team doing 'User Registration' feature is changing the serializer, we can easily identify that serializer is not generic and needs refactoring.

As we have seen, these method work by analyzing the source control systems.

Bug prediction

Even if we present the above proofs which demands refactoring, stakeholders may not agree for refactoring. So we need to explain other advantages of finding hot-spots. One is bug prediction. 

The concept is simple. If some code is changing and the reason for change is defect fix, we can predict that new changes in the area will introduce more defects unless we refactor. If we refactor that area, we can reduce the cost for defect fixing. Stakeholders always looks at money and quality to the end user. If the defects are getting reduced which improve the quality to end user, they will agree to refactor.

Below are some links related to how google used this source control based technique.
http://google-engtools.blogspot.com/2011/12/bug-prediction-at-google.html
http://www.google.com/patents/US20140033174

After implementing, there were studies on whether this really helps humans and could see that its not able to tell developer, what is the bug exactly. It just tells there are chances for bug in the file as its changed frequently.

Below are some implementations of the google bug prediction algorithm.
https://github.com/igrigorik/bugspots
https://github.com/niedbalski/python-bugspots

Tools

There are other tools as well for analyzing source control systems and extract information.
http://people.engr.ncsu.edu/ermurph3/papers/icsm11.pdf
Code maat

Moral of the story

Next time do not just say, we need to refactor the entire code base and stay without any answer when stakeholders ask why we need to refactor and what is the estimate . 
Instead present proof that these are the hot-spots we need to refactor and we need these much time.

Tuesday, June 21, 2016

Orchestration library for .Net 4.0

Introduction

It was started with queued processing in one of the projects. We have a queue framework mechanism for long running processes. If anyone needs to use that queue, they should host a web service which will be called from queue framework after de-queue. The queue type and service url needs to follow some convention to get the connection.

Problem

If it is a technology shop where developers follow principles such as SOLID this problem will never occur. Here in the delivery shop, the problem was in the service method which does the long running operation people started writing 2000+ lines.

May be the developers were securing their jobs. Soon the code base became monopoly to corresponding devs. No other developers were able to touch it.

It went without noticed for couple of years. It became problem soon after people left the team and started getting failures on really long running (more than 1 hour) queue processes. It was supposed to resume but none of those resumed from the failure point on requeue. It became nightmare to makre it resumable. 

Why you developed it in sequential way? The developers answered this is the way we know. We need to do sequence of operation of the de-queued message. Output of one operation may affect the subsequent operation sometimes they are independent. If required we can separate to functions.

SRP

It is clear that developers were not following the Single Responsibility Principle. Separating steps to methods doesn't help here because some long running operations share steps. So its the time to enforce SRP.

Another problem is the way queue framework relies the control to actual long running program. It was a simple service call based on convention. It helped the developers to write a service and write the logic inside the implementation method.

The solution is to accept step objects by the queue framework and call those one by one instead of calling a service method. So after each step Orchestrator can store the state so that it can restart from the last step. The steps can be unit tested easily.

The Orchestrator born here.

Mutability - Pipelined operations on same object context

Another concern was to manage the state. These operations were sharing same state and some time one required output of previous. Not necessarily the immediate previous step.

Everywhere its evil to have mutability and people strive for immutable systems. But here it was little difficult to achieve the same. So continued with mutability.

But how to control the flow better?

Stateless steps

Why should a step be ever stateful if Execute(context) method gets the context via parameter? If the  Orchestration engine can pass the same context object to every step and step can modify the state and there is no parallel step execution we are good. Isn't it?

Yes we are good without parallelism. 

Concurrency & Parallel step execution

Parallel processing is always difficult to get right. One first factor is to decide who defines the what to run in parallel. Whether its the Orchestrator framework or the step authors? Its a debatable topic and most probably we will end up with step developers to tell what to run in parallel. What will happen if developers, who don't know SRP defines the unit of parallelism?

Most of the time stability is better than speed. So no concurrency as of now.

Source

Being an architect I can suggest the solution. But cannot enforce it as long as I don't have the rights to hold a release. But there is no problem in developing such a framework and open sourcing the same.

https://github.com/joymon/Orchestration

Note: This is for .Net 4.0 users. If the target is .Net 4.5, there is better library called Banzai.

Tuesday, June 14, 2016

TypeScripting AngularJS 1.x - Routing

Though we say we are developing web apps, navigation is really important in it. It is always advisable to have separate urls for different screens though the entire page won't reload when we change the screens.
In AngularJS there are mainly 2 ways to achieve routing. One with the routing library coming with framework and other using ng-route and other is via ui-router. Comparison between those is a big thing and not coming  under this post. Here we are going to see how routing can be setup in the TypeScript way using the library inside the Angular framework.

TypeScript has definitions for ng.Route library. If we add that we can get the routing related types such as ng.route.IRouteProvider. But as always we can setup the route in JavaScript style inside TypeScript file. Since every valid JavaScript is valid TypeScript, there won't be any compilation issue. But the problem is maintainability it its a big app. If we  change a route or pass wrong argument, there won't be any warning / error in JavaScript way. But if its written in TypeScript way, it won't compile on wrong parameter type or names. If we are using Visual Studio, TypeScript way can show the  suggestions on what are the options available, when we configure routing.

Lets see how its done in code.

JavaScript way

angular.module('HRApp').config(['$routeProvider',
    function routes($routeProvider: ng.IRouteProvider) {
        $routeProvider
            .when('/add', {
                template: '<add-employee></add-employee>',
                caseInsensitiveMatch: true
            })
.           .when("/Company/:companyId/Employee/:empId", {
                template: '<employee-detail></employee-detail>'
            })
            .otherwise({
                template: '<employee-detail></employee-detail>'
            });     } ]);
Here how do we know, what are the properties available for the object being passed to .when() method? Whatever we pass, it never complain. But it never give us the required behavior. We may need to run the app 2-3 times to make sure the spelling and all are correct. Lets see in TyprScript.

TypeScript way

module HRApplication {
    "use strict";
    export class HRModule {
        app: ng.IModule;
        constructor() {
            this.app = angular.module("HR", ["ngRoute"]);
        }
        setupServices(): void {
        }
        setupDirectives(): void {
        }
        setupRoutes(): void {
            this.app.config(["$routeProvider",
                function ($routeProvider: ng.route.IRouteProvider) {
                    $routeProvider.when("/add", new AddEmployeeRoute())
                        .when("/Company/:companyId/Employee/:empId", new EmployeeDetailRoute())
                        .when("/Company/:companyId", new EmployeeDetailRoute())
                        .otherwise(new EmployeeDetailRoute());
                }]);
        }
    }
    var hrApp: HRModule = new HRModule();
    hrApp.setupServices();
    hrApp.setupDirectives();
    hrApp.setupRoutes();
}

What about the AddEmployeeRoute() and other route classes. Those should implement the ng.route.IRoute. Below is the code
class AddEmployeeRoute implements ng.route.IRoute {
    template = "<add-employee></add-employee>";
    caseInsensitiveMatch = true;
}
class EmployeeDetailRoute implements ng.route.IRoute {
    template = "<employee-detail></employee-detail>";
}


The advantage here is, if we inherit from IRoute we will get intellisense for the available properties such as caseInsensitiveMatch. If we are writing in JavaScript way, we never get a chance to ensure its correctly spelled unless we run the application.

Happy TypeScripting...

Tuesday, June 7, 2016

TypeScript - Static Constructor

TypeScript doesn't have inbuilt support for static constructor. But when we code applications there will be many scenarios where we need to execute something once for a class. Below is one example we can workaround this in TypeScript
module Company.Application.Feature.SubFeature {
    "use strict";
    export class MyClass {
        private static _constructor:void = (() => {
            var obj: MyClass = new MyClass();
            obj.setup();
        })();
        constructor() {
        }
        setup() {
        }
    }
}

How it works

The _constructor line will be executed as IIFE. ie the parameter less arrow function is executed when the script is loaded and assign void/undefined to _constructor variable.

Difference with .Net static constructor

In .Net the static constructor is kind of lazy.ie it will execute when that class is first referred for object creation or any other static method is called. But this workaround in TypeScript executes the static constructor soon after the script is loaded.