Dynamic Algorithm Modeling Application (DAMA)
DAMA is a generic distributed computing application, which can be reconfigured to model the structure of different algorithms. DAMA follows the single program, multiple data (SPMD) parallelism model and has been implemented using four different parallel programming solutions: Apache Spark, Hadoop MapReduce, Apache Hama and MPIJava.
After configuring DAMA to model the structure of a given algorithm, it can be used directly as an approximate benchmark for this algorithm on any of the supported frameworks or libraries. This process does not require any programming or code debugging steps and can thus significantly simplify the process of estimating the performance of the algorithm on different frameworks as we can avoid implementing the given algorithm on each of the available frameworks and using the implementations as benchmarks.
To be clear, all these steps are still going to be required once the target distributed computing framework is chosen. The goal of DAMA is simply to postpone these steps until it is known which specific framework should give the best result and thus greatly decrease the scope of work that has to be done. We feel that this kind of approach is critical as the number of available distributed computing frameworks continues to increase in the Hadoop ecosystem and also outside of it.
Dynamic Algorithm Modeling Application (DAMA) is a generic distributed computing application, which can be reconfigured to model the structure of different algorithms. It follows the single program, multiple data (SPMD) parallelism model and has been implemented using four different parallel programming solutions: Apache Spark, Hadoop MapReduce, Apache Hama and MPIJava. After configuring DAMA to model the structure of a given algorithm, it can be used directly as an approximate benchmark for this algorithm on any of the supported frameworks or libraries. This process does not require any programming or code debugging steps and can thus significantly simplify the process of estimating the performance of the algorithm on different frameworks as we can avoid implementing the given algorithm on each of the available frameworks and using the implementations as benchmarks.
To be clear, all these steps are still going to be required once the target distributed computing framework is chosen. The goal of DAMA is simply to postpone these steps until it is known which specific framework should give the best result and thus greatly decrease the scope of work that has to be done. This kind of approach is critical as the number of available distributed computing frameworks continues to increase in the Hadoop ecosystem and also outside of it.
Source code for DAMA is available at https://github.com/pjakovits/dama