Distributed Partial Simulation for Graph Pattern Matching

Pattern matching in big graphs is important for different modern applications. Recently, this problem was defined in terms of multiple extensions of graph simulation, to reduce complexity and capture more meaningful results. These results were achieved through the relaxation of commonly used constraint in subgraph isomorphism pattern matching. Nevertheless, these graph simulation variant models are still too strict to provide results in many cases, especially when analyzed graphs contain anomalies and incomplete information. To deal with this issue, we introduce a new graph pattern matching (GPM) method, called partial simulation, capable of retrieving matches despite missing parts of the pattern graph, such as vertices and/or edges. Furthermore, considering the number and inequality of the outputs, we define a relevance function to compute a value expressing how each match vertex respects the pattern graph. Similarly, we define partial dual simulation GPM that returns vertices that satisfy a part of the dual simulation constraints and assigns a relevance value to them. Additionally, we provide distributed scalable algorithms to evaluate the proposed partial simulation methods based on the distributed vertex-centric programming paradigm. Finally, our experiments on real-world data graphs demonstrate the effectiveness of the proposed models and the efficiency of their associated algorithms.
Big graphs, Graph pattern matching, Graph simulation, Distributed computing, Vertex-centric programming