Melab, NouredineChakroun, ImenBendjoudi, Ahcène2013-09-212013-09-212013-11http://dl.cerist.dz/handle/CERIST/212Branch-and-Bound (B\&B) algorithms are attractive methods for solving to optimality combinatorial optimization problems using an implicit enumeration of a dynamically built tree-based search space. Nevertheless, they are time-consuming when dealing with large problem instances. Therefore, pruning tree nodes (subproblems) is traditionally used as a powerful mechanism to reduce the size of the explored search space. Pruning requires to perform the bounding operation which consists of applying a lower bound function to the subproblems generated during the exploration process. Preliminary experiments performed on the Flow-Shop scheduling problem (FSP) have shown that the bounding operation consumes over $98\%$ of the execution time of the B\&B algorithm. In this paper, we investigate the use of GPU computing as a major complementary way to speed up the search. We revisit the design and implementation of the parallel bounding model on GPU accelerators. The proposed approach enables data access optimization. Extensive experiments have been carried out on well-known FSP benchmarks using an Nvidia Tesla C2050 GPU card. Compared to a CPU-based single core execution using an Intel Core i7-970 processor without GPU, speedups higher than $100$ times faster are achieved for large problem instances. At an equivalent peak performance, GPU-accelerated B\&B is twice faster than its multi-core counterpart.Massively Parallel ComputingGPU ComputingBranch-and-Bound AlgorithmsLower BoundingFlow-Shop SchedulingGPU-accelerated Bounding for Branch-and-Bound applied to a Permutation Problem using Data Access OptimizationArticle10.1002/cpe.3155