Tuesday, September 3, 2013

Reading formatted MS Excel cell values using Open XML SDK and showing as HTML

Most of the existing mechanisms which helps to access Excel cells from our applications doesn't give us the formatting of the excel cell. If we select the excel data using OleDB connectors it return plain text. The Excel COM interop give more, but it gives format details for entire cell. What about an Excel cell contains data like below and we want to show the excel cell values with formatting in a MVC application using a Rich Text Editor say KendoUI editor?

Bold data,
Italics, Underlined and big font in green color

Since everything in new Excel file format is XML based and open, the sdk provides the options. But not as easy as we think. We need to get the reference string and we need to iterate through the RunProperties collection. The code is as follows

private string GetCellValue(Cell excelCell)
        {
            if (excelCell == null) 
                return null;
            if (excelCell.DataType == null)
                return excelCell.InnerText;
 
            string value = excelCell.InnerText;
            switch (excelCell.DataType.Value)
            {
                case CellValues.SharedString:
                    value = ReadFromSharedString(excelCell);
                    break;
                case CellValues.Boolean:
                    switch (value)
                    {
                        case "0":
                            value = "false";
                            break;
                        default:
                            value = "true";
                            break;
                    }
                    break;
            }
            return value;
        }

This starts with the assumption that you have basic idea how to obtain Excel Cell object using open xml. If you are not clear on it please refer the link. Make sure you have the open xml sdk dll named DocumentFormat.OpenXml.dll. You may download from this link too.

What is SharedString and ReadFromSharedString? Actually excel keeps a reference to the value and the actual value will be there in a different node. The reference will normally be integer and that will be stored in the excelCell.InnerText property.Lets see the code for ReadSharedString();

private string ReadFromSharedString(Cell cell)
{
    string value = cell.InnerText;
    SharedStringTablePart sstPart = GetSharedTablePart(cell);
    // lookup value in shared string table
    if (sstPart != null && sstPart.SharedStringTable != null)
    {
        int index = int.Parse(value);
        if (sstPart.SharedStringTable.ElementAt(index).ChildElements.Count == 0)
        {
            value = sstPart.SharedStringTable.ElementAt(index).InnerText;
        }
        else
        {
            //ChildElements is collection of Run
            value = GetStringValueFromRuns(sstPart.SharedStringTable.ElementAt(index).ChildElements);
        }
    }
    return value;
}


First part is about how to get the SharedStringTablePart object? Next is about why we are calling GetValueFromRuns(). First things is simple. You can understand by looking the below code. Second one I will explain after the code segment to get the SharedStringTable.


private SharedStringTablePart GetSharedTablePart(Cell cell)
{
    Worksheet workSheet = cell.Ancestors<Worksheet>().FirstOrDefault();
    SpreadsheetDocument doc = workSheet.WorksheetPart.OpenXmlPackage as SpreadsheetDocument;
    SharedStringTablePart sharedStringTablePart = doc.WorkbookPart.GetPartsOfType<SharedStringTablePart>().FirstOrDefault();
    return sharedStringTablePart;
}


Now coming to the GetStringValueFromRuns. Basically the cell data is arranged as series of Run elements. Each run element represents unique formatting and corresponding text.In this case 'Bold Data' will be a run element and 'Italics' will be next run element. So we need to iterate the run collection and create the required formatting.


private string GetStringValueFromRuns(OpenXmlElementList openXmlElementList)
{
    StringBuilder text = new StringBuilder();
    foreach (Run run in openXmlElementList)
    {
        //    //t ADD OPENNING TAGS HERE, if any
        if (run.RunProperties != null)
        {
            foreach (OpenXmlLeafElement element in run.RunProperties.OrderBy(e => e.LocalName))
            {
                //StartTagWrite is the extension method.
                element.StartTagWrite(text);
            }
        }
        text.Append(run.Text.Text.Replace("\n","<br/>"));
        if (run.RunProperties != null)
        {
            foreach (OpenXmlLeafElement element in run.RunProperties.OrderByDescending(e => e.LocalName))
            {
                element.EndTagWrite(text);
            }
        }
    }
    return text.ToString();
}


Each Run will be having Run properties which define the attributes of each run. Important thing to note down is the Run properties classes are different in different Office versions.

Note the extension methods StartTagWrite and EndTagWrite. Those are defined in a different class


    static class Extension
    {
        public static void StartTagWrite(this OpenXmlLeafElement runElement, StringBuilder writer)
        {
            IWriteOperation x = new HTMLWriteOperation { Writer = writer };
            x.StartTagWrite((dynamic)runElement);
        }
        public static void EndTagWrite(this OpenXmlLeafElement runElement, StringBuilder writer)
        {
            IWriteOperation x = new HTMLWriteOperation { Writer = writer };
            x.EndTagWrite((dynamic)runElement);
        }
    }

Here comes a design challenge. Look at the casting using dynamic. This is because each run property is different class. ie there is no base class for run properties. The property class Bold Italic and Color don't have a common base. The base which we gets is the OpenXmlLeafNode. This is in fact a base class for many nodes other than Run properties. So we need to use the dynamic casting and call the method. Since its dynamic it will call corresponding methods defined in the interface.Below is the interface declaration and corresponding class


private interface IWriteOperation
{
    void StartTagWrite(Bold bold);
    void StartTagWrite(Italic it);
    void StartTagWrite(Underline underline);
    void StartTagWrite(Color c);
    void StartTagWrite(Font font);
    void StartTagWrite(FontSize fontSize);
    void StartTagWrite(FontFamily fontFamily);
    void StartTagWrite(FontScheme fontScheme);
    void StartTagWrite(RunFont runFont);
    void StartTagWrite(Strike strike);
    void StartTagWrite(VerticalTextAlignment vertialAlignment);
    void StartTagWrite(Shadow shadow);
    void StartTagWrite(Outline outline);
    void StartTagWrite(Condense condense);

    void EndTagWrite(Bold b);
    void EndTagWrite(Italic b);
    void EndTagWrite(Underline b);
    void EndTagWrite(Color u);
    void EndTagWrite(Font u);
    void EndTagWrite(FontSize u);
    void EndTagWrite(FontFamily u);
    void EndTagWrite(RunFont u);
    void EndTagWrite(FontScheme u);
    void EndTagWrite(Strike u);
    void EndTagWrite(VerticalTextAlignment u);
    void EndTagWrite(Shadow u);
    void EndTagWrite(Outline u);
    void EndTagWrite(Condense u);
}
private class HTMLWriteOperation : IWriteOperation
{
    public StringBuilder Writer { getset; }

    void IWriteOperation.StartTagWrite(Bold iThing) { Writer.AppendLine("<B>"); }
    void IWriteOperation.EndTagWrite(Bold iThing) { Writer.AppendLine("</B>"); }

    void IWriteOperation.EndTagWrite(Underline aThing) { Writer.AppendLine("</U>"); }
    void IWriteOperation.StartTagWrite(Underline aThing) { Writer.AppendLine("<U>"); }

    void IWriteOperation.StartTagWrite(Italic aThing) { Writer.AppendFormat("<I>"); }
    void IWriteOperation.EndTagWrite(Italic aThing) { Writer.AppendFormat("</I>"); }

    void IWriteOperation.StartTagWrite(Strike u) { Writer.AppendLine("<del>"); }
    void IWriteOperation.EndTagWrite(Strike aThing) { Writer.AppendLine("</del>"); }

    void IWriteOperation.StartTagWrite(Color u)
    {
        if (u.Rgb != null)
        {
            string colorFormat = @"<span style=""color:#{0};"">";
            string span = string.Format(colorFormat, u.Rgb.Value.Substring(2, 6));
            Writer.AppendLine(span);
        }
    }
    void IWriteOperation.EndTagWrite(Color aThing)
    {
        Writer.AppendLine("</span>");
    }
    void IWriteOperation.StartTagWrite(RunFont u)
    {
        if (u.Val.HasValue)
        {
            string fontFormat = @"<span style=""font-family:'{0}';"">";
            string span = string.Format(fontFormat, u.Val.Value);
            Writer.AppendLine(span);
        }
    }
    void IWriteOperation.EndTagWrite(RunFont aThing)
    {
        Writer.AppendLine("</span>");
    }

    void IWriteOperation.EndTagWrite(FontSize aThing)
    {
        Writer.AppendLine("</span>");
    }
    void IWriteOperation.StartTagWrite(FontSize u)
    {
        string fontSizeFormat = @"<span style=""font-size:{0}px;"">";
        string span = string.Format(fontSizeFormat, u.Val);
        Writer.AppendLine(span);
    }

    void IWriteOperation.StartTagWrite(Font u) { }
    void IWriteOperation.EndTagWrite(Font aThing) { }

    void IWriteOperation.EndTagWrite(FontScheme aThing) { }
    void IWriteOperation.StartTagWrite(FontScheme u) { }

    void IWriteOperation.EndTagWrite(Condense aThing) { }
    void IWriteOperation.StartTagWrite(Condense u) { }
    
    void IWriteOperation.EndTagWrite(VerticalTextAlignment aThing) { }
    void IWriteOperation.StartTagWrite(VerticalTextAlignment u) { }

    void IWriteOperation.EndTagWrite(Shadow aThing) { }
    void IWriteOperation.StartTagWrite(Shadow u) { }

    void IWriteOperation.StartTagWrite(FontFamily u) { }
    void IWriteOperation.EndTagWrite(FontFamily aThing) { }

    void IWriteOperation.StartTagWrite(Outline u) { }
    void IWriteOperation.EndTagWrite(Outline aThing) { }   
}

Reading a cell using OpenXML itself involves lot of code. So this is certainly not going to be easy to understand. Let me know if anybody needs a sample which does it.

5 comments:

Naveen said...

hi,this is going to be so hard for me to understand,Please post me a sample please............

Unknown said...
This comment has been removed by the author.
Unknown said...

Hi,
I am trying to understand this and trying to put this in a console application but finding difficult.
Can you share the sample please, so that I can get the complete flow of code. Thank you.

Unknown said...

Hi, at line (foreach (Run run in openXMlelementlist)
It throws error, cannot convert text to run. Please help

Joymon said...

@Unknown, Sorry there were a lot of comments to moderate, and missed your comment. Not sure you are still facing issues. If so please share any exception details. Or a sample that shows issue would be great help.