Browsing by Author "Chakroun, Imen"

Now showing 1 - 4 of 4

GPU-accelerated Bounding for Branch-and-Bound applied to a Permutation Problem using Data Access Optimization
(John Wiley & Sons, 2013-11) Melab, Nouredine; Chakroun, Imen; Bendjoudi, Ahcène
Branch-and-Bound (B\&B) algorithms are attractive methods for solving to optimality combinatorial optimization problems using an implicit enumeration of a dynamically built tree-based search space. Nevertheless, they are time-consuming when dealing with large problem instances. Therefore, pruning tree nodes (subproblems) is traditionally used as a powerful mechanism to reduce the size of the explored search space. Pruning requires to perform the bounding operation which consists of applying a lower bound function to the subproblems generated during the exploration process. Preliminary experiments performed on the Flow-Shop scheduling problem (FSP) have shown that the bounding operation consumes over $98\%$ of the execution time of the B\&B algorithm. In this paper, we investigate the use of GPU computing as a major complementary way to speed up the search. We revisit the design and implementation of the parallel bounding model on GPU accelerators. The proposed approach enables data access optimization. Extensive experiments have been carried out on well-known FSP benchmarks using an Nvidia Tesla C2050 GPU card. Compared to a CPU-based single core execution using an Intel Core i7-970 processor without GPU, speedups higher than $100$ times faster are achieved for large problem instances. At an equivalent peak performance, GPU-accelerated B\&B is twice faster than its multi-core counterpart.
Reducing thread divergence in a GPU-accelerated branch-and-bound algorithm
(2013) Chakroun, Imen; Mezmaz, Mohand; Melab, Nouredine; Bendjoudi, Ahcène
In this paper, we address the design and implementation of GPU-accelerated Branch-and-Bound algorithms (B&B) for solving Flow-shop scheduling optimization problems (FSP). Such applications are CPU-time consuming and highly irregular. On the other hand, GPUs are massively multi-threaded accelerators using the SIMD model at execution. A major issue which raises when executing on GPU a B&B applied to FSP is thread or branch divergence. Such divergence is caused by the lower bound function of FSP which contains many irregular loops and conditional instructions. Our challenge is therefore to revisit the design and implementation of B&B applied to FSP dealing with thread divergence. Extensive experiments of the proposed approach have been carried out on well-known FSP benchmarks using an Nvidia Tesla C2050 GPU card. Compared to a CPU-based execution, accelerations up to ×77.46 are achieved for large problem instances.
Reducing Thread Divergence in GPU-based B&B Applied to the Flow-shop problem
(2011-09-11) Chakroun, Imen; Bendjoudi, Ahcène; Melab, Nouredine
In this paper,we propose a pioneering work on designing and programming B&B algorithms on GPU. To the best of our knowledge, no contribution has been proposed to raise such challenge. We focus on the parallel evaluation of the bounds for the Flow-shop scheduling problem. To deal with thread divergence caused by the bounding operation, we investigate two software based approaches called thread data reordering and branch refactoring. Experiments reported that parallel evaluation of bounds speeds up execution up to 54.5 times compared to a CPU version.
Reducing Thread Divergence in GPU-Based B&B Applied to the Flow-Shop Problem
(Springer Berlin Heidelberg, 2012) Chakroun, Imen; Bendjoudi, Ahcène; Melab, Nouredine
In this paper,we propose a pioneering work on designing and programming B&B algorithms on GPU. To the best of our knowledge, no contribution has been proposed to raise such challenge. We focus on the parallel evaluation of the bounds for the Flow-shop scheduling problem. To deal with thread divergence caused by the bounding operation, we investigate two software based approaches called thread data reordering and branch refactoring. Experiments reported that parallel evaluation of bounds speeds up execution up to 54.5 times compared to a CPU version.