An Efficient and Privacy-preserving Similarity Evaluation For Big Data Analytics
Big data systems are gathering more and more information in order to discover new values through data analytics and depth insights. However, mining sensitive personal information breaches privacy and degrades services’ reputation. Accordingly, many research works have been proposed to address the privacy issues of data analytics, but almost seem to be not suitable in big data context either in data types they support or in computation time efficiency. In this paper we propose a novel privacy-preserving cosine similarity computation protocol that will support both binary and numerical data types within an efficient computation time, and we prove its adequacy for big data high volume, high variety and high velocity.
big data, data analytics, cosine similarity, privacy.