Tuesday, October 8, 2013

Reading Excel cell with number formatting using OpenXML sdk

Background

In Excel, we can assign cell styles such as number format, date-time format, etc... If we have an excel document with that formatting, we normally would like to see the formatting when we read the same using a program. But if you are using OpenXML SDK to read excel file, it cannot be achieved directly. We need to read the style index of Excel cell object and find the corresponding CellStyle then format the CellValue based on the format code. Let's see how to achieve that using C# and .Net

Reading excel cell value with formatting

I assume that you have a basic understanding of OpenXML SDK programming. The below steps explains how to read the "A1" cell of an excel document with formatting using open xml sdk.

Step 1 - Obtain the object of Excel cell

First, create a OpenXMLExcelReader class. Accept the excel file name as a parameter via the constructor and keep in a member variable. Then start with the method ReadFirstCell()


    public class OpenXMLExcelReader
    {
        string _filePath;
        public OpenXMLExcelReader(string filePath)
        {
            _filePath = filePath;
        }
        public string ReadFirstCell()
        {
                return ReadCell( "A", 1);
        }
    }


Now, where is the ReadCell method which needs to be a generic implementation which will read any cell value by accepting cell reference? It can be defined as below


        public string ReadCell(string columnName, uint row)
        {
            string value = string.Empty;
            using (SpreadsheetDocument doc = SpreadsheetDocument.Open(_filePath, true))
            {
                var sheet = doc.WorkbookPart.Workbook.GetFirstChild<Sheets>().Elements<Sheet>().First();
                var worksheetPart = (WorksheetPart)doc.WorkbookPart.GetPartById(sheet.Id.Value);
                Cell cell = GetCell(worksheetPart.Worksheet, columnName, row);
                if (cell != null)
                {
                    value = GetCellValue(cell);
                }
            }
            return value;
        }


GetCell() method is returning the cell object nicely. Let's see how the GetCell is implemented. You can see 2 methods which do their responsibilities neatly.


        private static Cell GetCell(Worksheet worksheet, string columnName, uint rowIndex)
        {
            Cell returnCell = null;
            Row row = GetRowFromWorksheet(worksheet, rowIndex);
            if (row != null)
            {
                returnCell= row.Elements<Cell>().Where(c => string.Compare
                       (c.CellReference.Value, columnName +
                       rowIndex, true) == 0).FirstOrDefault();
            }
            return returnCell;
        }
        private static Row GetRowFromWorksheet(Worksheet worksheet, uint rowIndex)
        {
            return worksheet.GetFirstChild<SheetData>().
              Elements<Row>().Where(r => r.RowIndex == rowIndex).FirstOrDefault();
        }

Step 2 - Reading value from excel Cell object using CellStyle

This step is a little more complex. Sometimes we can read InnerText directly. Sometimes we need to get value from a Shared string table. Sometimes using Run elements and if formatting is applied, we need to retrieve the format and apply to the text which is read from cell. Let's see how we can organize it.

        private static string GetCellValue(Cell excelCell)
        {
            string value;
            if (excelCell == null)
                return null;
            if (excelCell.DataType == null)
            {
                return GetCellValueWithoutConsideringDataType(excelCell);
            }
            value = excelCell.InnerText;
            //If none of the below cases are executed, return the innerText            
            switch (excelCell.DataType.Value)
            {
                case CellValues.SharedString:
                    value = GetCellValueByReadingFromSharedString(excelCell);
                    break;
                case CellValues.Boolean:
                    switch (value)
                    {
                        case "0": value = "false"break;
                        default: value = "true"break;
                    }
                    break;
            }
            return value;
        }


In most of the cases, if the cell formatting is applied as number the DataType will be null and it will get into the method GetCellValueWithoutConsideringDataType(). Below is the implementation.


        private static string GetCellValueWithoutConsideringDataType(Cell excelCell)
        {
            CellFormat cellFormat = GetCellFormat(excelCell);
            if (cellFormat != null)
            {
                return GetFormatedValue(excelCell, cellFormat);
            }
            else
            {
                return excelCell.InnerText;
            }
        }

If there is a cell format associated with the cell, process accordingly, else return the innerText. Below is the code to obtain CellFormat object from the cell.

        private static CellFormat GetCellFormat(Cell cell)
        {
            WorkbookPart workbookPart = GetWorkbookPartFromCell(cell);
            int styleIndex = (int)cell.StyleIndex.Value;
            CellFormat cellFormat = (CellFormat)workbookPart.WorkbookStylesPart.Stylesheet.CellFormats.ElementAt(styleIndex);
            return cellFormat;
        }
 
        private static WorkbookPart GetWorkbookPartFromCell(Cell cell)
        {
            Worksheet workSheet = cell.Ancestors<Worksheet>().FirstOrDefault();
            SpreadsheetDocument doc = workSheet.WorksheetPart.OpenXmlPackage as SpreadsheetDocument;
            return doc.WorkbookPart;
        }

Now we have cell value and cell format. If the cell format has any number format, lets apply that.

        private static string GetFormatedValue(Cell cell, CellFormat cellformat)
        {
            string value;
            if (cellformat.NumberFormatId != 0)
            {
                string format = GetWorkbookPartFromCell(cell).WorkbookStylesPart.Stylesheet.NumberingFormats.Elements<NumberingFormat>()
                    .Where(i => i.NumberFormatId.Value == cellformat.NumberFormatId.Value)
                    .First().FormatCode;
                double number = double.Parse(cell.InnerText);
                value = number.ToString(format);
            }
            else
            {
                value = cell.InnerText;
            }
            return value;
        }


This handles only Number formatting.If you want to deal with other formats you need to extend this. Reading from the shared string table can be found in my previous post about reading excel cells with formatting.

1 comment:

assafwo said...

Unable to cast object of type 'DocumentFormat.OpenXml.Spreadsheet.Text' to type 'DocumentFormat.OpenXml.Spreadsheet.Run'.


Line 181: {
Line 182: StringBuilder text = new StringBuilder();
Line 183: foreach (Run run in openXmlElementList)
Line 184: {
Line 185: // //t ADD OPENNING TAGS HERE, if any