FTH-B&B: a Fault-Tolerant Hierarchical Branch and Bound for Large Scale Unreliable Environment
Solving to optimality large instances of combinatorial optimization problems using Brand and Bound (B&B) algorithms requires a huge amount of computing resources. In this paper, we investigate the design and implementation of such algorithms on computational grids. Most of existing grid-based B&B algorithms are based on the Master-Worker paradigm, their scalability is therefore limited. Moreover, even if the volatility of resources is a major issue in grids fault tolerance is rarely addressed. We propose FTH-B&B, a fault tolerant hierarchical B&B. FTH-B&B is based on different new mechanisms enabling to efficiently build and maintain balanced the hierarchy, and to store and recover work units (sub-problems). FTH-B&B has been implemented on top of ProActive and applied to the Flow-Shop scheduling problem. Very often, the validation of existing grid-based B&B works is performed either through simulation or a very small real grid. In this paper, we experimented FTH-B&B on the Grid'5000 real French nation-wide computational grid using up to 1900 processor cores distributed over 6 sites. The reported results show that the overhead induced by the approach is very low and an efficiency close to 100% can be achieved on some Taillards benchmarks of the Flow-Shop problem. In addition, the results demonstrate the robustness of the approach even in extreme failure situations.
Large scale experimentation, Paralle Branch and Bound, Grid computing, Fault tolerance