Monday, December 30, 2013

Merging 2 collections without duplicates in C# .Net using Linq

Before Linq there was only one method to join 2 collections without duplicate elements. Loop through the first list and check whether the element exists in second list,if not add element into second list. But Linq makes it easier to achieve this requirement. Its mainly done by using union method. Lets see how a simple integer list can be merged without duplicates.

            int[] list1 = {2,3,1 };
            int[] list2 = { 2, 3, 10 };
            foreach (int number in list1.Union<int>(list2))
            {
                Console.WriteLine(number);
            }

Output
2
3
1
10

By default Union removes the duplicates. Its easy to remove duplicates in a primitive type collection. What about dealing with a custom type such as Employee class with 2 properties EmpId and Name? How do the .Net linq library function Union knows 2 Employee objects are equal? Basically when we say 2 objects are same, one more more of their properties should be equal.Lets see how an custom type can tell the framework about its equality.

Object.GetHashCode()

This is a method which is used to retrieve the identity of an object as an integer value. When we use the union internally its calling this method in our Employee class and if 2 objects return same value the framework consider the objects as same

Object.Equals()

Whenever GetHashCode returns a value which is same as another object's hash code the framework calls this method to confirm the object equality. Equals method gets 2 objects and we can do comparison here and return a boolean value which tells whether the objects are equal or not. If we return true the framework discards the second object from the union result. So our Employee class will become as follows

    class Employee
    {
        public string Name { getset; }
        public int EmpId { getset; }
        public override bool Equals(object obj)
        {
            bool isEqual = false;
            Employee emp = obj as Employee;
            if (emp != null)
            {
                isEqual = emp.EmpId == this.EmpId;
            }
            return isEqual;
        }
        public override int GetHashCode()
        {
            return this.Name.GetHashCode(); 
        }
    }

It first checks for the HashCode. In this case the hash code is computed by considering only the Name property. If the hash codes returned are same, the framework calls the equals method and takes decision based on its return value.

In the first look it may seem that why we need both these methods? But if we omit the Equals and use only GetHashCode which returns different values, the framework will not consider it as equal objects. Lets see how this can be used to combine 2 Employee lists.

            IList<Employee> empList1 = new List<Employee>() 
            { 
                new Employee(){EmpId=1,Name="joy"},
                new Employee(){EmpId=2,Name="george"}
            };
            IList<Employee> empList2 = new List<Employee>() 
            { 
                new Employee(){EmpId=1,Name="joy"},
                new Employee(){EmpId=2,Name="mon"} 
            };
            foreach (Employee emp in empList1.Union<Employee>(empList2))
            {
                Console.WriteLine("{0},{1}",emp.EmpId,emp.Name);
            }

Output
1,joy
2,george
2,mon

This is the simplest method. If you are a strict follower of SOLID principle, you can move the comparison part to a different class.Only thing you need is to implement IEqualityComparer interface in it.

    class Employee
    {
        public string Name { getset; }
        public int EmpId { getset; }
    }
    class EmployeeComparer : IEqualityComparer<Employee>
    {
        bool IEqualityComparer<Employee>.Equals(Employee x, Employee y)
        {
            return x.EmpId == y.EmpId;
        }
 
        int IEqualityComparer<Employee>.GetHashCode(Employee obj)
        {
            return obj.Name.GetHashCode();
        }
    }
///////////////////////////////////////////////////////////////////////////////////////////
//Testing
            foreach (Employee emp in empList1.Union<Employee>(empList2,new EmployeeComparer()))
            {
                Console.WriteLine("{0},{1}",emp.EmpId,emp.Name);
            }
        

If you want to alter the comparison method in such a way that it should only consider the EmpId, we can have that logic inside the GetHashCode & Equals methods.I have included only those methods below for reference.This works with the earlier simple sample as well where we haven't used IEqualityComparer.
        

    class EmployeeComparer : IEqualityComparer<Employee>
    {
        bool IEqualityComparer<Employee>.Equals(Employee x, Employee y)
        {
            return x.EmpId == y.EmpId;
        }
 
        int IEqualityComparer<Employee>.GetHashCode(Employee obj)
        {
            return obj.EmpId.GetHashCode();
        }
    }

Output
1,joy
2,george        

Happy coding.

No comments: