Overview
Memory management in sequential (single threaded) applications is fairly straightforward. There are three basic types of storage; stack, heap, and static (file scope) data. Fixed size transient data is typically allocated from the stack, variable sized transient data is typically allocated from the heap, and persistent data is usually stored in static (file scope) data, or allocated from the heap.
In the case of multi-threaded and/or multi-process applications however, the situation generally requires more consideration. Applications that are written for current single threaded environments may not scale to shared memory architectures, and even those written for current shared memory multi-core processors, may not scale to next generation distributed memory processors. In all cases, pointers are invalid across disparate memory spaces, and so the sharing of pointers needs care.
Data that is shared between processes needs to be moved, but within a multi-threaded environment this is an unnecessary and expensive overhead. The CORE runtime provides mechanisms that allow data to be shared by reference, and only moved when accessed from a component in a disparate memory space (e.g. another machine on a network).
The CORE runtime memory management service has been specialized for the particular requirements of concurrent programming. In addition to the conventional dynamic memory allocation services, CORE provides a number of extensions that allow fast operation in multi-core environments. It also provides a Global Shared Memory (GSM) object model that allows objects in remote processes to be accessed by reference transparently. This will mean that applications written for the current state of the art multi-core SMPs will execute without modification across clusters, and more importantly, on next generation distributed memory many-core architectures.
Conventional Storage Mechanisms
Concurrent programming imposes a number of restrictions on the standard storage mechanisms.
File Scope/Static Data
The use of file scope data is discouraged for several reasons. If it is shared between threads then it will typically involve the use of low-level locking. More importantly, it cannot be shared across distributed memory architectures. File scope data is a convenience in sequential programming but is not actually required.
Stack/Automatic Data
Stack data is used in exactly the same way as it is with standard sequential code, and so most processing code can be migrated to Blueprint without modification. In a multi-core environment however, objects may need to be shared between multiple threads and possibly processes.
Whilst objects that are allocated from the stack can be shared between threads, it is not recommended. If an object is allocated from the stack owned by one thread, then it will become invalid when the owning thread returns from the function that allocated it. Sharing therefore requires additional logic so that the owning thread will block (not return) until all of its sharers have finished with the data. This can be cumbersome and error prone.
Heap Data
Conventional heap allocation is fully supported, but when it is heavily used in a multi-core environment, will typically perform poorly due to lock contention. This includes malloc and free, new and delete, and other mechanisms such as STL.
Additional Storage Mechanisms
CORE provides a number of additional mechanisms that allow data to be allocated, shared and exchanged between threads and/or processes.
Process Scope Objects
In the case of shared memory platforms, lock contention can be a major bottleneck. In practice however, many allocations are actually thread specific and the locking overhead can therefore be greatly reduced. The restriction imposed by CORE's thread specific allocation is that only the owning thread can allocate from its heap (but any thread can free any block from any heap). Thread specific heaps are allocated from the main process heap and calls are provided that enable heap usage to be monitored and accounted. Objects allocated from thread specific heap have process scope and so they can be shared across atomic circuits. CORE STL provides a portable means of managing data at atomic circuit scope. For more details of thread specific memory allocation see Dynamic Memory Allocation, Overloading New and Delete, and the Standard Template Library.
CLIP threads, call-backs, methods and circuits can all contain 'Workspace' data. Workspace is persistent and is conserved between object invocations. Typically, workspace is used to hold tables of constant data (e.g. FFT twiddle factors) but can also be used for transient storage that is too large to create from thread stack.
Global Shared Memory (GSM) Scope Objects
Objects that are shared across distributed memory (or have the potential to be) should use store objects. CLIP provides transient stores and persistent stores that are used to exchange information between circuits. If circuits are located in the same process at runtime they will exchange all information by reference, but if they are accreted to separate processes, they transparently move and cache 'store' data. When transient store data is no longer referenced, all instances are 'freed'. This means that applications run repeatably on shared memory and distributed memory platforms without modification.
In addition to workspace objects, methods also support persistent state. Like workspace, method state is conserved between invocations, but in the case of state, the scheduler will ensure that slave processes have the latest version of state before the method instance executes. Since there is a latency associated with this operation, state should only be used when information needs to conserved between successive invocations. An example might be a rolling average. In most cases, workspace is preferable.