Deadlocks

Deadlocks occur in a system when two (or more) processes are each waiting for a resource that is held by the other. Neither can run, so neither can free the resource that the other needs.

Deadlocks are a serious issue in multi-threaded/multi-processor software development, but they are rarely spotted by developers as they require a precise set of circumstances to occur in a precise order. A set of circumstances that rarely occur during debugging/development due to the different timing constraints involved in a debugger or development network. However, Parkinson's Law predicts that 'Anything that can go wrong, will go wrong and at the most inappropriate moment'. Ie once the system has gone live, with a large number of users.

CDL does not prevent deadlocks (although it does limit the number of potential places deadlocks can occur). CDL does make it easier to visualise where they can occur so that they can be dealt with. The examples below show circuitry where deadlocks could occur and details the steps to prevent them.

Examples

Multiple ASTs

When several methods/threads need to open a number of AST's for write/update, a predefined order needs to be stipulated. The order is arbitrary (alphabetical is as good as any other), but developers must open AST's in the predefined order. This will ensure that at least one method/thread can run and get the last AST, the others will only block transiently.

NB. The problem only occurs if multiple AST are held open at the same time. If AST's are opened and closed one at a time, there is no problem.

Auxiliary Connections

When a method makes an auxiliary connection to a store (or any object) and waits for an event, the wait will cause the Method's worker thread to block. All other Methods being managed by that thread will also block. This will probably include the Method that needs to free up the store. Several actions can be used to prevent the deadlock.

Method priority - by adjusting the priority of either Method, will result in CDL creating separate worker thread for each Method. However, each thread could still contain other Methods at these priorities.
Buffer depth - ensure that there is sufficient depth in the Event store so that Method A never blocks trying to write to it.
Timeouts - use a timeout (preferably 0 - CLP_POLL) on the event so that if the store can't be opened then the Method doesn't block. Whenever, a timeout is used on a connection the Method

(or Thread) need to handle the 'failed' event, by taking some recovery action. This may simply be raising a ClpDiag and throwing away the event, however, this is highly application requirement dependent.

Eg.

if ( EventSstCxn.OpenWrite( 0, CLP_NO_RULE, CLP_POLL ) )
{
   // write to Event store
   EventSstCxn.Close();
}
else
{
   ClpDiag( "Event Store buffer full" );
   // dont close the event - it wasnt opened
}

NB. This is less of an issue on multi-core and networked applications, as Method B could be running in a separate worker thread (of equal priority) on another core/processor. BUT IT MIGHT NOT BE !

Feedback

When a method makes multiple connections to a TST (for either read or write) the 'Buffer Depth' of the store must be at least equal to the number of connections used.

NB. Trigger connections are held open for the whole duration the method is executing.

Collection Order

In this example collectors AandB and BandC collect events from two stores. In particular they both collect from B. This is a non-distributed or competed connection so whoever gets the event first, gets the event. In this example both collectors try to collect B first before A or C. Thus is B becomes available AandB or BandC will collect it and move on. Lets assume its AandB. If A becomes available there is no problem AandB completes its collection and passes the event to AB_or_BC and MthdInst1. However, if A never becomes available AndB will never complete. If C were to become available, it would never be collected as BandC is blocked waiting for B.

Changing the order of the collectors so that the competed connection is always last ensures a fair fight when trying to collect B. (Increasing the Buffer Depth of B would also help, but hides the logic.)

Aborting

In this case TST A is distributed to both the start A and end M of the chain. If one of the methods in the chain where to abort is output, then M would never receive it.

If the buffer depth of A and or the reentrancy of the Dbx is 1 (or less than the number of processes in the chain) then methods A,B etc will not be able to operate on subsequent events as they are being kept idle by TST A or the Dbx

Increasing the Buffer Depth of A and reentrancy of the Dbx will unlock the chain. However, the events arriving at M will not be a matching pair. (see Hints Performance)

Complex Situations

All of the examples above show relatively simple scenarios. In practice these situations will occur with other objects in between or spread across circuits. You need to look carefully.

Blueprint Help	Send comments on this topic.
Hints Deadlocks