Tuesday, October 28, 2014

Stateful or Stateless classes?

What is mean by state of an object?

Before we discuss about Stateless or Stateful classes we should have better understanding about what is mean by the state of an object. Its same as the English meaning "the particular condition that someone or something is in at a specific time." of state.

When we come to programming and think about the condition of object at a specific time, its nothing but the value of it's properties or member variables at a given point of time. Who decides what are the properties of objects. Its the class. Who decides what are the properties and members inside a class?Its programmer who coded that class. Who is programmer? Everybody who reads this blog including me who is writing this post. Are we all experts in taking decision on what are the properties needed for each class?

I don't think so. At least its true in case of programmers in India who come into the software industry by only looking at the salary and programming as a daily job. First of all its not something can be taught in colleges like how other engineering disciplines works. It needs to come via experience because programming is in its early stages compared to other engineering and its more like art than engineering. Engineering can sometimes have hard rules but art cannot. Even after being in the programming for around 15 years (sorry I count my college days as well in programming experience)  I still take considerable amount of time to decide what are the properties needed for a class and the name of the class itself.

Can we bring some rules to what are the properties needed? In other words what properties, should the state of an object include? Or should the objects be stateless always. Below are some thoughts on this area.

Entity classes / Business Objects

There are multiple names such as entity classes , business objects etc...given to classes which are representing a clear state of something. If we take example of  Employee class, it's sole purpose is to hold the state of an employee. What that state probably can contain? EmpId, Company, Designation, JoinedDate etc...I hope there would be no confusions till this point. Everybody agrees that this type of classes should be stateful without much arguments, because this is taught in college.

But how we should do salary calculation? 
  • Should the CalculateSalary() needs to be a method inside the Employee class?
  • Should there be a SalaryCalculator class and that class should contain the Calculate() method
  • In case there is SalaryCalculator class 
    • Whether it should have properties such as BasicPay,DA HRA etc?
    • Or the Employee object needs to be a private member variable in that SalaryCalculator which is injected via constructor?
    • Or SalaryCalculator should expose Employee public property (Get&SetEmployee methods in Java)

Helper / Operation / Manipulator classes

This is the type of classes which do a task. SalaryCalculator fall into this type. There are many names to this type where classes do actions and can be found in programs with many prefix and suffixes such as
  • class SomethingCalculator eg:SalaryCalculator
  • class SomethingHelper eg: DBHelper
  • class SomethingController eg: DBController
  • class SomethingManager 
  • class SomethingExecutor
  • class SomethingProvider
  • class SomethingWorker
  • class SomethingBuilder
  • class SomethingAdapter
  • class SomethingGenerator
A long list can be found here. People have different opinion in using which suffix for what situation. But our interest is something else. 

Whether can we add state to this type of classes? I would suggest stateless. Lets examine why I am saying 'no', in rest of this post.

Hybrid classes

According to wikipedia encapsulation in object oriented programming is "Encapsulation is the packing of data and functions into a single component". Does this mean all the methods which manipulate that object should be there in the entity class? I don't think so. The entity class can have state accessor methods such as GetName() ,SetName(), GetJoiningDate ,GetSalary() etc...

But CalculateSalary() should be outside. Why its so?

According to the SOLID - Single Responsibility Principle "A class should change only for one reason". If we keep CalculateSalary() method inside the Employee class that class will change for any of the below 2 reasons which is a violation.
  • A state change in Employee class eg: A new property has been added to Employee
  • There is a change in the calculation logic
I hope its clear. Now we have 2 classes in this context. Employee class and SalaryCalculator class. How do they connect each other. There are multiple ways. One is to create object of SalaryCalculator class inside the GetSalary method and call the Calculate() to set the salary variable of Employee class. If we do so it became hybrid because it is acting like entity class and it initiate operation like helper class. I really don't encourage this type of hybrid classes. But in situations such as Save entity method, this is kind of OK with some sort of delegation of operation.

Whenever you feel that your class is falling in this hybrid category, think about re-factoring. if you feel that your classes are not falling in any of these categories stop coding.

State in Helper / Manipulator class

What is the problem if our helper classes keep state? Before that lets look at what are the different combination of state values a SalaryCalculator class can take? Below are some examples

Scenario 1 - Primitive values


    class SalaryCalculator
    {
        public double Basic { getset; }
        public double DA { getset; }
        public string Designation { getset; }
 
        public double Calculate()
        {
            //Calculate and return
        }
    }

Cons

There are chances that the Basic salary can be of a Accountant and the Designation can be "Director"  which is not at all matching.There is no enforced way to make sure that the SalaryCalculator can work independently.

Similarly if this executes in threaded environment, it will fail.

Scenario 2 - Object as state


    class SalaryCalculator
    {
        public Employee Employee { getset; }
 
        public double Calculate()
        {
            //Calculate and return
        }
    }

Cons

If one SalaryCalculator object is shared by 2 threads and each thread is for different employee, the sequence of execution might be as follows which cause logical errors.
  • Thread 1 sets employee1 object
  • Thread 2 sets employee2 object
  • Thread 1 calls Calculate method and gets Salary for employee2
We can argue that the Employee dependency can be injected via constrictor and make the property read only. Then we need to create SalaryCalculator objects for each and every employee object. So better do not design your helper classes in this way.

Scenario 3 - No state


    class SalaryCalculator
    {
        public double Calculate(Employee input)
        {
            //Calculate and return
        }
    }


This is near perfect situation. But here we can argue that, if all the methods are not using any member variable what is the use of keeping it as non static class.

The second principle in SOLID principles says "Open for extension and closed for modification". What does it mean? When we write a class, it should be complete. There should be no reason to modify it. But should be extensible via sub classing and overriding. So how would our final one looks like?

    interface ISalaryCalculator
    {
        double Calculate(Employee input);
    }
    class SimpleSalaryCalculator:ISalaryCalculator
    {
        public virtual double Calculate(Employee input)
        {
            return input.Basic + input.HRA;
        }
    }
    class TaxAwareSalaryCalculator : SimpleSalaryCalculator
    {
        public override double Calculate(Employee input)
        {
            return base.Calculate(input)-GetTax(input);
        }
        private double GetTax(Employee input)
        {
            //Return tax
            throw new NotImplementedException();
        }
    }

As I mentioned in my previous posts, always program to interface. In the above code snippet, I implemented implicitly. That is to reduce the space here. Always implement explicitly. The Logic of calculation should be kept in a protected function so that the inherited classes can call that function in case required.

Below is the way how this Calculator class should be consumed.

    class SalaryCalculatorFactory
    {
        internal static ISalaryCalculator GetCalculator()
        {
            // Dynamic logic to create the ISalaryCalculator object
            return new SimpleSalaryCalculator();
        }
    }
    class PaySlipGenerator
    {
        void Generate()
        {
            Employee emp = new Employee() { };
            double salary =SalaryCalculatorFactory.GetCalculator().Calculate(emp);
        }
    }

The Factory class encapsulate the logic of deciding which child class to be used. It can be static as above or dynamic using reflection. As far as the reason for change in this class is object creation, we are not violating the "Single responsibility principle"

In case you are going for Hybrid class and need to invoke from the Employee.GetSalary() as below.

    class Employee
    {
        public string Name { getset; }
        public int EmpId { getset; }
        public double Basic { getset; }
        public double HRA { getset; }
        
        public double Salary
        {
            //NOT RECOMMENDED 
            get{return SalaryCalculatorFactory.GetCalculator().Calculate(this);}
        }
    }

This way we ensure that, even if there is change in the SalaryCalculation logic the Employee class will not change.

Conclusion

Don't code when we are thinking. Don't think when we are coding

  • Spent some time on class design before coding. Show the class diagram to 2-3 fellow programmers and get their opinions.
  • Name the class wisely. There is no hard rule. But below are some I am following
    • Entity classes should be named with nouns which represents a type of object - eg: Employee
    • Helper / Worker class names should be reflecting that its a worker. eg: SalaryCalculator, PaySlipGenerator etc...
    • Verb should never be used as class name - eg:class CalculateSalary{}

Tuesday, October 21, 2014

Delete a SQL Server database schema with all its objects

Recently as part of R&D I had to delete all database schemas in a SQL Server Database. The major pain I foresee on identifying objects associated with it and deleting those in order. I was confident that somebody might have faced the same earlier and the script will be available as its, That's correct. I got a good link in first google itself. Its given below

http://ranjithk.com/2010/01/31/script-to-drop-all-objects-of-a-schema/#comment-428

Really thanks to this guy. But when I tried deleting the schema in my database using this SP, I got an error saying that the schema cannot be dropped as there are some user defined table types inside it. The technique which this guy used is to get the objects of schema is to query the sys.objects and that never gives the User Defined Table Types inside the schema.

SELECT *
FROM sys.objects SO
WHERE SO.schema_id = schema_id(@SchemaName) order by name

This might also be faced by some other people so read some comments but no luck. So had to spend sometime on the query and added the code to delete UDTT too.

--Add DROP TYPE statements into table
INSERT INTO #dropcode
SELECT 'DROP TYPE '+ @SchemaName + '.'+name
FROM   sys.types
WHERE  is_table_type = 1 and schema_id=schema_id(@SchemaName)

File can be downloaded from here
 
Once again thanks to Ranjith the author of original post and hope he wont mind me changing his work and redistributing

Tuesday, October 7, 2014

SQL Server internals - How to see where my data record stored

As I mentioned in many of my previous posts, its very difficult for me to learn something without seeing how its done internally. For example you can see how I explored .Net GC working in one my previous post. This time I am trying to learn how SQL Server stores that data internally.

Where SQL Server stores our tables & records?

As everybody knows, its in the disk only. But which file? Where its located. There are at least 2 files required for each database and we can see the file paths in the properties tab of SQL Server Database or query the details.

How the data records, tables are organized

We could see that the data is stored in normal files with extension .mdf,.ldf and .ndf. Does that mean we can open that in notepad and see it? Is the SQL Server just open the file and writing into it just like how we did in C/C++ labs in college?

Absolutely no. As SQL Server is a production ready software so it cannot do like academic code. It has more levels which optimize the storage techniques for maximum performance. One level is the file groups where we can specify more than one file for a group and associate with partition. Another level is the page. SQL Server considers a page as the atomic unit of storage. The page size is 8KB. It  does the IO operations such as reads / caches at page level only. Even if we need one record from a page, it reads the entire page.

Lets get into how the records are stored. As we know the physical storage order of records in SQL Server database is based on the clustered index and normally the primary key will be clustered index. We cannot have more than one physical storage order for data records. That's is why there is only one clustered index allowed.

How to inspect SQL Server pages

But there is something called non-clustered indexes. If the records cannot be physically stored in more than one order how they help us? Those are different data structures which tells the order of rows in a different way. Before going to "how the non-clustered indexes works" lets get full understanding about how the clustered index works and how to see the data inside page.

I am glad to say that people before me already thought in the same way and done enough hard work to explain the storage with good pictures. So why I need to do the task again? I just read their blogs and see understood how it works. So sharing the same via my blog.

Below is the blog post where I could see the storage is explained with undocumented SQL Server functions called DBCC IND & DBCC PAGE
http://www.mssqltips.com/sqlservertip/1578/using-dbcc-page-to-examine-sql-server-table-and-index-data/

References

http://www.practicalsqldba.com/2012/04/sql-server-index-fragmentation.html
https://www.simple-talk.com/sql/database-administration/sql-server-storage-internals-101/

My interest was about index fragmentation. So I did some more research on it and preparing my own post where we can see how fragmentation can be created and solved.