Damaris is a middleware for multicore SMP nodes allowing them to efficiently handle data transfers for storage and visualization by dedicating one or a few cores to the application I/O. It is developed within the framework of a collaboration of the KerData team at INRIA Rennes – Bretagne Atlantique and the JLPC. The current version allows efficient asynchronous I/O, hiding all I/O related overheads such as data compression and post-processing. It was evaluated with the CM1 tornado simulation, one of the Blue Waters target applications. Future work is targeting fast direct access to the data from running simulations and efficient I/O scheduling.
Download Damaris: http://damaris.gforge.inria.fr/doku.php
FTI (Fault Tolerant Interface)
FTI was initially designed and developed in the context of a collaboration with Titech. FTI provides an API and a library for checkpoint/restart at application level. The API is at data structure level. The programmer decide which data to protect and when the protection should be performed. The library provides transparent multi-level checkpoint-restart with 4 or 5 levels of checkpointing.
Download FTI: http://leobago.com/projects/
HELO (Hierarchical Event Log Organizer): This tool classifies system messages considering their description patterns in a 2 step hierarchical clustering process. In the experiment part we compare our tool with 2 other Apriori tools (Loghound, SLCT), another pattern extractor (IPLOM) and an affinity propagation technique (StrAp). It is important in this step to have a high precision since in the next step we want to analyze temporal and spatial characteristics as well as correlations for the previously found groups of events.
Download HELO: http://www.ana-gainaru.com/software/helo1.2.zip
SD-FT: Implementation in MPICH2 of Fault Tolerance protocol for Send-Determinist application. Status: Prototype under evaluation.
BlobCR (BlobSeer-based Checkpoint-Restart) is a checkpoinitng framework specifically optimized for MPI applications that need to be ported to IaaS clouds. It introduces a dedicated checkpoint repository that is able to efficiently take incremental snapshots of the whole disk attached to the virtual machine instances, which can be used either directly at application level, or transparently at system level. Transparent checkpointing support at system level is provided through a modified MPICH2 library that needs to be installed inside the virtual machines images. It relies on BLCR to save the state of the MPI processes to the virtual disk, which is then snapshotted with BlobCR.
TreeMatch Algorithm implemented as a Load-balancer for Charm++. TreeMatch (existing before the collaboration) maps processes to computing elements based on the hierarchy topology of the target environment and on the communication pattern of the different processes. We have found that on large instances the TreeMatch algorithms outperforms the greedy loadbalancer of charm++. However, the algorithm only balances communication and does not take into account the topology and memory hierarchy. In March 2011 we have worked on improving these two aspects. We have enhanced our solution and we have designed a loadbalancer that is load-, topology- and communication-aware.