A Clustering Application for the Web Usage Mining
The Web Usage Mining constitutes a new branch of the Web Mining. It allows the study of the behavior of both users and potential customers via their site navigation. The mainly used source for the Web Usage Mining is the servers Log Files. A Log File contains an important mass of data, including user’s information (username, used software, etc.) and all the queries he has made on the website (requested files, the number of bytes transferred, time spent on each page, the page of entry to the site .... etc.). In this work we shall outline an application, made on this type of data, which is based on a clustering method, namely KMEANS. This application allows the definition of homogeneous groups constituting users profiles so that to anticipate the needs and with a view of communication adapted to each segment of users. In this application we have recorded some technical problems. These problems concerns the data cleaning (removing queries of images and multimedia files associated with web pages, removing queries from search bots... etc.) and the setting up of visitor sessions, knowing that a session is a sequence of pages viewed by the same user.
Clustering, K-Means, Web Usage Mining, Server Log File
- Research Reports