Tuesday, July 28, 2015

Best practices in SSIS Logging


  • Log always using SSIS logging mechanism
  • Do not reinvent logging mechanism using any manual methods even in C# script tasks
    • Script task to log into SSIS logging stream
      • http://dataqueen.unlimitedviz.com/2013/09/how-to-enable-custom-logging-for-an-ssis-script-task/
      • https://msdn.microsoft.com/en-us/library/ms136131%28v=sql.120%29.aspx?f=255&MSPPError=-2147217396
  • Inject the log destination from invoking platform. This will help to correlate the SSIS logs with other application logs.
    • eg: If the SSIS is invoked from C# application let the C# provide the logging connection string.
  • Better log all the events and point to database as logging store.
  • Have periodic routine to cleanup the logs.

Above are the best practices as of my understanding and experience with SSIS. Sorry to say that, I cannot guarantee that it will work for all.

Tuesday, July 21, 2015

Do we need SQL Server Integration Services running to execute SSIS / .dtsx package?

Question - Do we need a SSIS windows service to execute .dtsx file?

Here is the scenario. We have a .dtsx package file and need to execute it. Should we need to have SQL Server Integration Services running in our machine to execute that package?

Answer - No

Lets look at what is SQL Server Integration Services. By definition

"The SSIS service is a Windows service that monitors the SSIS packages that are running. The SSIS service also manages the storage of SSIS packages" 
From Microsoft KB https://support.microsoft.com/en-us/kb/942176 & MSDN Page

It clearly says we don't need need SSIS running in our machine to execute a package. We simply need the DTExec.exe utility to run .dtsx packages. So what is the use of SSIS windows service?

It is just to keep the .dtsx packages safe, initiate, monitor and stop the execution. It really means that there can be different users and roles who can only execute packages. Also helps to control the packages in remote machine.

Question - Do we need SQL Server running to run integration services package(.dtsx)

The SSIS is related with SQL Server. So do we need to have any SQL Server instance running, if we want to run a .dtsx package?

Answer - No

Though the Integration Services is for data transformation, it is not developed with tight integration with SQL Server. We can develop any algorithm using integration services. As seen above DTExec.exe is the execution utility. It doesn't need any SQL Server instance running to execute dtsx package is the package is not establishing connection with SQL Server.

What is the confusion?

Why people often confuse .dtsx package with SQL Server? The reason is mainly everybody thinks that since Integration Services are for ETL and data related operations, it must be linked with SQL Server and SSIS. Another reason is the expansion of SSIS starts with SQL Server. The better way is to start calling it as simply "Integration Services". As of my understanding even MSFT has the confusion. Some places they use simply Integration Services, some places Microsoft Integration Services and sometime refer full SQL Server Integration Services. So we cannot blame anybody for getting confused :)

So lets consider this as another technology to achieve workflow execution platform recommended for building enterprise-level data integration and data transformations solutions. This technology has many tools and components to help us in various phases of development and deployment. 

BIDS - Business Intelligence Development Studio

The tool which is used to create the workflow using drag and drop mechanism. It is Visual Studio itself. It gives option to define parameters, variables, connections, log provider and edit the script tasks. It captures the workflow steps in the form of xml and saved as simple file with extension .dtsx.

.dtsx

The file extension of file which contains the workflow in xml. This format is open and XSD can be found in MSDN. This file can be imported to SSIS package store to have better management. Integration Services .(dtsx) files can be created either from BIDS or using our own program. 

DTExec.exe

Utility used to to execute any .dtsx package either stored in file system or SSIS package storage. Parameters are passed via command line. Location
[Drive]:\Program Files\Microsoft SQL Server\110\DTS\Binn\DTExec.exe

DTExecUI.exe

GUI utility to run .dtsx packages. Provides UI to enter the package parameters. Located at
[Drive]:\Program Files (x86)\Microsoft SQL Server\110\Tools\Binn\ManagementStudio\DTExecUI.exe

SSIS

A windows service used to store and track execution of Integration Service packages (.dtsx)

Microsoft.SqlServer.ManagedDTS.dll

This is the .Net managed dll which provides API to create and execute .dtsx packages. This contains the namespace Microsoft.SqlServer.Dts.Runtime which holds the relevant classes. Located in below location if 2012 is installed

[Drive]:\Program Files (x86)\Microsoft SQL Server\110\SDK\Assemblies\Microsoft.SQLServer.ManagedDTS.dll

References

https://en.wikipedia.org/wiki/SQL_Server_Integration_Services
http://stackoverflow.com/questions/454802/what-is-the-ssis-package-and-what-does-it-do

Tuesday, July 14, 2015

Not abel to create DTS.LogProviderTextFile.2 when running SSIS from .Net

Recently I was trying to execute a SSIS package from C# .Net code and it was not working as expected. So I decided to get the logs from the package. Unfortunately there was no logs. So I decided to inject the LogProvider and ConnectionManager from my C# code during execution.

Google lead me to a MSDN link which explains the same.
https://msdn.microsoft.com/en-us/library/ms136023(v=sql.105).aspx.

I selected SQL Server 2012 to get more accurate code snippet. The BIDS version I used was VS2010 based.

But I got below exception when I run the sample code.

A first chance exception of type 'Microsoft.SqlServer.Dts.Runtime.DtsRuntimeException' occurred in Microsoft.SqlServer.ManagedDTS.dll

Additional information: The log provider type "DTS.LogProviderTextFile.2, {0A039101-ACC1-4E06-943F-279948323883}" specified for log provider "{652479A1-923C-452D-9328-5F703624B5FC}" is not recognized as a valid log provider type. This error occurs when an attempt is made to create a log provider for unknown log provider type. Verify the spelling in the log provider type name.

Googled again and was not able to get much help. So decided to debug myself. I added LogProvider from BIDS UI and checked its type in Visual Studio before executing the package. It says the type is "DTS.LogProviderTextFile.3". It clearly indicate that if we create log provider with .3 suffix, it will work. I tried and it succeeded.

So if the target SSIS run time is SQL 2012 change the provider type to DTS.LogProviderTextFile.3. 

Tuesday, July 7, 2015

Developing Electron HTML5-Javascript desktop application in Windows 7

What is Electron?

The official repository says "Build cross platform desktop apps with web technologies". Its really true but it comes with some limitations. We cannot write system programs or device drivers with this. Its a wrapper over Chromium with native pluggable points to provide a platform for rendering HTML and running Javascript inside executable. Yes its true that we can develop cross platform windows desktop application using HTML and Javascipt. Similar to creating windows 8 or universal apps using HTML5 and Javascript. Advantage with Electron is, it can run in different operations systems such as Linux, OS X along with Windows.

Just look at official repository to understand this "our own mini chromium browser which can render our html javascript application".

https://github.com/atom/electron/blob/master/docs/tutorial/quick-start.md

Steps to setup environment

We can setup in multiple ways. The easier way I found is given below
  1. Go to the release page in Eletron Github repository.
  2. Download the latest platform version. At the time of writing this post the version is 0.28.3 and file size is around 46 MB.
    1. Take the 64 bit version if you have the 64 bit machine. The format is electron-vx.xx.xx-win32-x64.zip .Eg electron-v0.28.3-win32-x64.zip
  3. Extract to c:\Electron\ . It can be to any folder but just for simplicity put in c: drive itself.
  4. Double click on C:\Electron\electron.exe file to test the Electron runtime. We can see a default application has started. The source code for this application will be found in C:\Electron\resources\default_app

Steps to create new project

  1. Create a folder for our app anywhere in the file system. In Electron desktop app is similar to web app which is a folder of html, javascript and other resource files. For ease of explanation we are assume that the path is c:\apps\testelectron
  2. Copy package.json, index.html, and main.js from C:\Electron\resources\default_app into c:\apps\testelectron. This will get us the starting point similar to File->New ->Project in Visual Studio
  3. Clean up index.html
    1. Remove all styles and scripts
    2. Change the title tag content to "My First Electron Application"
    3. Make sure body contains only the word "Hello world"
  4. Replaces the contents in the main.js with the main.js contents from Github help page 
  5. Now run the project by any of the following methods
    1. In command prompt type c:\Electron\Electron.exe c:\apps\testelectron
    2. Start Electron.exe by double clicking and drag and drop the testelectron folder into the window.

Hide F12 Developer tools in Electron application

Debugging Electron apps

As we are seeing we gets our favorite F12 dev tools to debug our app. By default the developer tools will be visible when we run the application. To hide that remove the below line from main.js.

mainWindow.openDevTools();

Avoid default Electron menu on our application

Now our application is running from Electron shell. The Shell shows the menu. We may check this by looking at the C:\Electron\resources\default_app. To remove that we need to make sure our app is the only one started. To do so copy testelectron folder into C:\Electron\resources and rename testelectron folder to app.

Some magic happens and now if we double click on Electron.exe our application will start alone :). As you know computer cannot do magic. Its all convention defined by developers to speed up the process.

Packaging and distributing

Just refer the official documentation on packaging and distribution.

Pros

  • One language and technology for web and windows. This helps the developers focus on the business goal rather than spending time on google to learn technology and language.
    • Can leverage javascript frameworks such as angular, jQuery etc...
  • Leverage async programming model from node.js modules.
  • Easy portability. No need to maintain different code bases.
  • Comes with Chromium's process model. Each window is different process. This reduces chances to crash if we build big apps. Also minimize the memory leak as the process itself gone when a window is closed.
  • Package management done via Node Package Manager (NPM)

Cons

  • Its heavy. Need to package runtime with our app.
  • Not all native operations can be done.
  • As always need to use only The Good Parts of javascript
  • Process model put overhead on window communication. Need to use IPC. Also it takes more total memory than single process model.
As of Microsoft, they use this technology in their new VSCode code editor. Similarly many big players are adopting it. Its worth taking a look.

Tuesday, June 30, 2015

C# async and await programming model from scratch

Introduction

This is a brief introduction to async and await keywords to a normal developer who wants to understand the basics and little insights to internals of how stuff works.

Background

Asynchronous programming is now an essential thing when we develop any application because it avoids waiting in main thread on long running operations such as disk I/O, network operations database access etc...In normal case, if our program needs something to be done from the results of these long operations, our code is struck until the operation is done and we proceed from that point. 

Using async mechanism, we can just trigger long running operations and can do other tasks. Those long running operations does the job in different thread and when they complete it, they notify our main code and our code can do the post actions from here. When we refer our code, its our main thread which deals with user interface or the thread which primarily process a web request. Sometimes we ourselves write these kind of long running operations.

What is async and await

In simple sense these are 2 new keywords introduced in .Net 4.5 to easily write asynchronous programs. They work in the method level. Of course we cannot make classes work in parallel as they are not unit of execution. 

Are these keywords known to CLR, the .Net run-time or a wrapper over TPL Task Parallel Library ? If they are wrappers, it it good to have language depends on a library written using same language?

We will find out the answer to these questions in this article.

History of .Net async programming

Threads were there from the very beginning of the .Net framework. They were the wrappers on operating system threads and little difficult to work with. Then more concepts such as background worker, async delegate and Task Parallel Library came to ease the async programming model. Those came as part of class library. C# language as such doesn't had 'out of the box' support for  async programming until the async and await keywords are introduced with C# 4.0. Lets see how the async and await helps us in async programming by examining each of these methods.

Example

Lets take the below example of finding factorial of first N numbers if they are completely divisible by 3. We are using console application for simplicity. If we had used a windows application we could easily hook into async event delegate handler and demo the async features in easily. But that won't help us to learn the language features.

Synchronous code


We can see there is a loop runs from 1 to 5 using counter variable. It find whether the current counter value is completely divisible by 3. If so it writes the factorial. The writing function calculates the factorial by calling FindFactorialWithSimulatedDelay() method. This method here in sample is going to put delay to simulate real life workload. In other sense this is the long running operation.

Easily we can see that the execution is happening in sequence. The WriteFactorial() call in loop waits until the factorial is calculated. Why should we wait here? Why can't we move to next number as there is no dependency between numbers? We can. But what about the Console.WriteLine statement in WriteFactorial(). It should wait until the factorial is found. It means we can asynchronously call FindFactorialWithSimulatedDelay() provided there is a call back to the WriteFactorial(). When the async invocation happens the loop can advance counter to next number and call the WriteFactorial().

Threading is one way we can achieve it. Since the threading is difficult and needs more knowledge than a common developer, we are using async delegates mechanism. Below is the rewrite of WriteFactorial() method using async delegate.

Making it async using async delegates

One of the easier method used earlier was to use Asynchronous Delegate Invocation. It uses the Begin/End method call mechanism. Here the run-time uses a thread from thread pool to execute the code and we can have call backs once its completed. Below code explains it well which uses Func delegate.

No change in finding factorial. We simply added new function called WriteFactorialAsyncUsingDelegate() and modified the Main to call this method from the loop.

As soon as the BeginInvoke on findFact delegate is called the main thread goes back to the counter loop, then it increment the counter and continue looping. When the factorial is available the anonymous call back will hit and it will be written into console.

We don't have direct option to cancel the task. Also if we want to wait for one or more methods its little difficult.

Also we can see that the piece of code is not wrapped as object and we need to battle with the IAsyncResult object to get the result back. TPL solves that problem too, It looks more object oriented. Lets have a look.

Improving async programming using TPL

TPL is introduced in .Net 4.0. We can wrap the asynchronous code in a Task object and execute it. We can wait on one or many tasks to be completed. Can cancel task easily etc...There are more to it. Below is a rewrite of our Factorial writing code with TPL.

Here we can see that first task is run then its continuing with next task which is the completed handler which receives notification of first task and writing the result to console.

Still this is not a language feature. We need to refer the TPL libraries to get the support. Main problem here is the effort to write the completed event handler. Lets see how this can be rewritten using async and await keywords.

The language feature async and await

We are going to see how the TPL sample can be rewritten using async and await keywords. We decorated the WriteFactorialAsyncUsingAwait method using async keyword to denote this function is going to do operations in async manner and it may contain await keywords. Without async we cannot await.

Then we are awaiting on the factorial finding function. The moment the await is encountered during the execution, thread goes to the calling method and resume the execution from there. Here in our case the counter loop and takes next number. The awaited code is executed using TPL as its task. As normal it takes a thread from the pool and execute it. Once the execution is completed the statements below the await will be executed.
Here also we are not going to change anything in the FindFactorialWithSimulatedDelay(). 

This avoids the needs for extra call back handlers and developers can write the code in a sequential manner.

What is the relation with Task Parallel Library and async await keywords

The keywords async and await make use of TPL internally. More clearly we can say async and await are syntactic sugar in C# language. Still not clear? In other sense the .Net runtime doesn't know about async and await keywords.

Look at the above disassembled code of  WriteFactorialAsyncUsingAwait(). I used reflector to disassemble the assembly.

Should a language depend on a library/class created with it?

This is a old question. If we look at C or C++, the language was always independent and the libraries were fully depend on it. But if we look from introduction of yield keyword in C#, we can see there is a marriage between language features (keywords) and libraries. Here yield which is a language feature depends on IEnumerable interface created using the language itself. Then the compiler does the magic and replace the yield keyword with corresponding IEnumerable implementation making sure CLR doesn't know about yield keyword.

Another example is using keyword. Its tightly coupled with IDisposable interface. Since then there are many syntactic sugars added more details can be found in below link.

http://blogs.msdn.com/b/ericlippert/archive/2010/10/28/asynchrony-in-c-5-part-one.aspx

Personally I don't prefer mixing language features with libraries. Let language evolve its own and libraries depend on language. If we do the other way the compiler is forced to inject more code and we know adding more lines is not coming in free. But unfortunately we are in a world where coding needs to be fast not the execution of the code.

Should the compiler modify our code?

The main problem with compiler modifying our code is debugging. There are chances that we will see call stack of our application which contains symbols which are not written by us. Try to see the call stack by raising an exception in anonymous method. If there are many anonymous methods in the application, that's it. We are done in debugging.

Should the language know parallel programming and threading?

This is another area to discuss. Since the threading is managed by OS, should the language care about threading. Should the threading be a library or integrated language feature?

Now a days most of the hardware has multiple cores and if the language doesn't provide the integrated features, nobody will leverage multiple cores. This is because either development community is afraid of threading or it requires additional coding time. If the language gives easy way, developers can focus more on the functionality or business side of the app than threading which is infrastructure related.

So I really want my language and associated runtime know parallel and async programming. But please try to avoid tight coupling with class library and stop compiler altering my code.

When should we use it?

We can use async and await anytime we are waiting for something. ie whenever we are dealing with async scenarios. Examples are file IO, network operations, database operations etc...This will help us to make our UI responsive.

So go ahead and make sure your APIs are await-able.

References

https://msdn.microsoft.com/en-us/library/hh191443.aspx
http://stephenhaunts.com/2014/10/10/simple-async-await-example-for-asynchronous-programming/
https://richnewman.wordpress.com/2012/12/03/tutorial-asynchronous-programming-async-and-await-for-beginners/

Tuesday, June 23, 2015

Running Roslyn Analyzers from console application

This post require prior knowledge from my earlier post. In last post we saw how our C# program can compile other C# program via Roslyn compiler. Yes its without Visual Studio. If we are compiling without Visual Studio how a custom Roslyn DiagnosticAnalyzer can be invoked to do analysis on the code fragment which is getting compiled.

Here we are going to see how we can connect with Roslyn custom analyzers when our program is compiling another program using .Net compiler platform.
First we are defining our custom analyzer which is nothing but a check to make sure the type names are starting with capital letter. See the code below. Now lets move on how this analyzer can be integrated to our programmatic compilation using Roslyn.
  1. GetSimpleCompilation()
    This is same as what is there in previous post about simple compilation.
  2. GetAnalyzerAwareCompilation
  3. Here we connect our compilation unit with custom Roslyn diagnostic analyzer.
  4. GetAllDiagnosticsAsync()
  5. This is Roslyn API to execute our custom analyzers on top of source code input and get the results back. The important point here is the compiler platform returns same type ImmutableArray<Diagnostic> every time. ie Regardless of whether its simple compilation or compilation with analyzers.

    If the result from GetAllDiagnosticsAsync is empty list, proceed to actual compilation which produces the assembly file.
  6. ShowDiagnostics()
    This is too same as previous post.

Tuesday, June 16, 2015

Using Roslyn to compile C# source code

Below are the steps write a program to compile C# code snippet using Roslyn (Now .Net Compiler Platform)to produce an assembly. Simply saying a C# program compiling another C# program using Roslyn compiler.

The Roslyn nuget version used here is 1.0.0-rc2. This tutorial is organized in zoom in/out fashion.

There are mainly 3 high level steps and they are getting a compilation unit, compiling the same and processing result / diagnostics it produced. The code snippet shows the steps.

  1. GetSimpleCompilation() Now lets look at what is happening in GetSimpleCompilation. As seen in the code we are getting the source code, creating syntax tree out of that then prepare the references needed for the compilation. Finally Creating CSharpCompilation using all the information.
    1. GetSourceCode()
    2. This method simply returns sample source code to compile.Can be replaced by any other code snippet.
    3. GetSyntaxTree()
    4. This method parses the input code and creates the CSharpSyntaxTree.
    5. GetReferences()
    6. Compilation always needs what are the references needed. Here its very basic as we are not using any advanced features.
  2. Compile()
  3. Now its the time to compile. It needs to get the assembly file name to proceed. Since the GetOutpuFilePath() is just simply returning the dll file path, omitting it.
  4. ShowDiagnostics()
  5. This method shows any warnings in compilation.
The compiled assembly can be kept in memory and execute from there. Please refer this link for more details.

Happy coding...