Distributed graph pattern matching via bounded dual simulation

dc.contributor.authorBouhenni, Sarra
dc.contributor.authorYahiaoui, Saïd
dc.contributor.authorNouali-Taboudjemat, Nadia
dc.contributor.authorKheddouci, Hamamache
dc.date.accessioned2023-02-26T08:20:48Z
dc.date.available2023-02-26T08:20:48Z
dc.date.issued2022-09
dc.description.abstractGraph Pattern Matching (GPM) finds subgraphs of a large data graph that are similar to an input query graph. It has many applications, such as pattern recognition, detecting plagiarism, and finding communities in social networks. Current real-world applications generate massive amounts of data that cannot be stored on the memory of a single machine, which raises the need for distributed storage and processing. Recent relaxed GPM models, although of polynomial time complexity, are nevertheless not distributed by nature. Moreover, the existing relaxed GPM algorithms are limited in terms of scalability. In this paper, we propose Bounded Dual Simulation (BDSim) as a new relaxed model for a scalable evaluation of GPM in massive graphs. BDSim captures more semantic similarities compared to graph simulation, dual simulation, and even strong simulation. It preserves the vertices’ proximity by eliminating cycles of unbounded length from the resulting match graph. Furthermore, we propose distributed vertex-centric algorithms to evaluate BDSim. We prove their effectiveness and efficiency through detailed theoretical validation and extensive experiments conducted on real-world and synthetic datasets. To the best of our knowledge, BDSim is the first relaxed GPM model that captures the cyclic structure of the query graph while being feasible in cubic time.
dc.identifier.issn0020-0255
dc.identifier.urihttps://dl.cerist.dz/handle/CERIST/964
dc.publisherElsevier
dc.relation.ispartofseriesInformation Sciences; Vol. 610
dc.relation.pages549-570
dc.structureCalcul pervasif et mobile (Pervasive and Mobile Computing group)
dc.subjectGraph pattern matching
dc.subjectSubgraph matching
dc.subjectMassive graphs
dc.subjectGraph simulation
dc.subjectDistributed graph processing
dc.subjectGPM
dc.titleDistributed graph pattern matching via bounded dual simulation
dc.typeArticle
Files