It was started with queued processing in one of the projects. We have a queue framework mechanism for long running processes. If anyone needs to use that queue, they should host a web service which will be called from queue framework after de-queue. The queue type and service url needs to follow some convention to get the connection.
If it is a technology shop where developers follow principles such as SOLID this problem will never occur. Here in the delivery shop, the problem was in the service method which does the long running operation people started writing 2000+ lines.
May be the developers were securing their jobs. Soon the code base became monopoly to corresponding devs. No other developers were able to touch it.
It went without noticed for couple of years. It became problem soon after people left the team and started getting failures on really long running (more than 1 hour) queue processes. It was supposed to resume but none of those resumed from the failure point on requeue. It became nightmare to makre it resumable.
Why you developed it in sequential way? The developers answered this is the way we know. We need to do sequence of operation of the de-queued message. Output of one operation may affect the subsequent operation sometimes they are independent. If required we can separate to functions.
It is clear that developers were not following the Single Responsibility Principle. Separating steps to methods doesn't help here because some long running operations share steps. So its the time to enforce SRP.
Another problem is the way queue framework relies the control to actual long running program. It was a simple service call based on convention. It helped the developers to write a service and write the logic inside the implementation method.
The solution is to accept step objects by the queue framework and call those one by one instead of calling a service method. So after each step Orchestrator can store the state so that it can restart from the last step. The steps can be unit tested easily.
The Orchestrator born here.
Mutability - Pipelined operations on same object context
Another concern was to manage the state. These operations were sharing same state and some time one required output of previous. Not necessarily the immediate previous step.
Everywhere its evil to have mutability and people strive for immutable systems. But here it was little difficult to achieve the same. So continued with mutability.
But how to control the flow better?
Why should a step be ever stateful if Execute(context) method gets the context via parameter? If the Orchestration engine can pass the same context object to every step and step can modify the state and there is no parallel step execution we are good. Isn't it?
Yes we are good without parallelism.
Concurrency & Parallel step execution
Parallel processing is always difficult to get right. One first factor is to decide who defines the what to run in parallel. Whether its the Orchestrator framework or the step authors? Its a debatable topic and most probably we will end up with step developers to tell what to run in parallel. What will happen if developers, who don't know SRP defines the unit of parallelism?
Most of the time stability is better than speed. So no concurrency as of now.
Being an architect I can suggest the solution. But cannot enforce it as long as I don't have the rights to hold a release. But there is no problem in developing such a framework and open sourcing the same.
Note: This is for .Net 4.0 users. If the target is .Net 4.5, there is better library called Banzai.