Skip to content
README.md 3.98 KiB
Newer Older
kkeller's avatar
kkeller committed
What is FTI?
kkeller's avatar
kkeller committed

kkeller's avatar
kkeller committed
FTI stands for Fault Tolerance Interface and is a library that aims to give
computational scientists the means to perform fast and efficient multilevel
checkpointing in large scale supercomputers. FTI leverages local storage plus
data replication and erasure codes to provide several levels of reliability and
performance. FTI is application-level checkpointing and allows users to select
which datasets needs to be protected, in order to improve efficiency and avoid
wasting space, time and energy. In addition, it offers a direct data interface
so that users do not need to deal with files and/or directory names.  All
metadata is managed by FTI in a transparent fashion for the user. If desired,
users can dedicate one process per node to overlap fault tolerance workload and
scientific computation, so that post-checkpoint tasks are executed
asynchronously.

---

Download, compile and install FTI (as easy as 1,2,3)
kkeller's avatar
kkeller committed

kkeller's avatar
kkeller committed
 1) git clone https://github.com/leobago/fti.git
 2) mkdir fti/build && cd fti/build
 3) cmake -DCMAKE_INSTALL_PREFIX:PATH=/install/here/fti .. && make all install

> **REMARK 1** (Intel and GCC)
> For the case that both, **Intel and GCC**, compilers are installed, please configure using:
kkeller's avatar
kkeller committed
> `cmake -C ../intel.cmake -DCMAKE_INSTALL_PREFIX:PATH=/install/here/fti ..`
kkeller's avatar
kkeller committed

> **REMARK 2** (OpenSSL)
> To use the built-in MD5 rather than OpenSSL, please configure using:
> `cmake -DENABLE_OPENSSL=true -DCMAKE_INSTALL_PREFIX:PATH=/install/here/fti ..`
kkeller's avatar
kkeller committed

> **REMARK 3** (GNU versions)
kkeller's avatar
kkeller committed
> The usage of different GNU compiler versions for C and Fortran leads currently to an undefined behavior. Please make sure the compiler identification for C and Fortran is the same.
kkeller's avatar
kkeller committed

> **REMARK 4** (Cray System)
> FTI works on Cray system with these modules
> GNU environment:
> `module load gcc/5.3.0 CMake/3.6.2 craype/2.5.8 cray-mpich/7.5.0 PrgEnv-gnu/6.0.3 `
> `export CRAY_CPU_TARGET=x86-64`
> `export CRAYPE_LINK_TYPE=dynamic`
> Flag for CMake: `-CMAKE_SYSTEM_NAME=CrayLinuxEnvironment`
>
> Intel environment:
> `module load intel/17.0.1.132 CMake/3.6.2 craype/2.5.8 cray-mpich/7.5.0 PrgEnv-intel/6.0.3`
> `export CRAY_CPU_TARGET=x86-64`
> `export CRAYPE_LINK_TYPE=dynamic`
> Flag for CMake: `-CMAKE_SYSTEM_NAME=CrayLinuxEnvironment`
>
> The most important is CMake version: the newer the better.
Karol Sierociński's avatar
Karol Sierociński committed

kkeller's avatar
kkeller committed
---
kkeller's avatar
kkeller committed

kkeller's avatar
kkeller committed
Configure and run a FTI example
kkeller's avatar
kkeller committed

kkeller's avatar
kkeller committed
The "build/examples" directory contains heat distribution simulations as simple
kkeller's avatar
kkeller committed
examples in both, C and Fortran. Usage instructions in file "examples/README".
kkeller's avatar
kkeller committed

kkeller's avatar
kkeller committed
---
kkeller's avatar
kkeller committed

User manual
FTI is bundled with guides for users, contributors and maintainers alike.
The complete documentation is hosted in `ReadTheDocs <https://fault-tolerance-interface.readthedocs.io/en/develop/>`_.
In addition, we also provide documentation in the repository.
You will find a user manual in folder "doc/manual", which contains the API description and code snippets for the implementation of FTI checkpoint I/O usage.
It is also possible to generate a full Doxygen documentation during the build process.
To do so, add the CMake flag `-DENABLE_DOCU=ON` when building the library and execute the following command in the build directory:
kkeller's avatar
kkeller committed
```
    make doc
kkeller's avatar
kkeller committed
```
Acknowledgement (send us a postal card! \\\(^-^\)/\)
===
kkeller's avatar
kkeller committed

If you use FTI please consider sending us an email to let us know what you
liked and what could be improved ( :email: leonardo (dot) bautista (at) bsc (dot) es),
your feedback is important.
kkeller's avatar
kkeller committed

kkeller's avatar
kkeller committed
If you use FTI for any research work, please make sure to acknowledge our paper:
Bautista-Gomez, Leonardo, et al. ***"FTI: high performance fault tolerance interface
for hybrid systems."*** Proceedings of 2011 international conference for high
performance computing, networking, storage and analysis. ACM, 2011.

This work has been supported by EU H2020 ICT project LEGaTO, contract #780681.
kkeller's avatar
kkeller committed

kkeller's avatar
kkeller committed
Finally, don't hesitate to send us a postal card to :
Dr. Leonardo Bautista-Gomez (Leo)
Barcelona Supercomputing Center
Carrer de Jordi Girona, 29-31, 08034 Barcelona, SPAIN.
Phone :telephone_receiver: : +34 934 13 77 16