A Clustering application for the Web Usage Mining.
Pushpa Publishing House
The web usage mining constitutes a new branch of the web mining. It allows the study of the behavior of both users and potential customers via their site navigation. The mainly used source for the web usage mining is the servers log files. A log file contains an important mass of data, including user’s information (username, used software, etc.) and all the queries he has made on the web site (requested files, the number of bytes transferred, time spent on each page, the page of entry to the site, etc.). In this work, we shall outline an application made on this type of data, which is based on a clustering method, namely K-means. This application allows the definition of homogeneous groups constituting users’ profiles so that to anticipate the needs and with a view of communication adapted to each segment of users. In this application, we have recorded some technical problems. These problems concern the data cleaning (removing queries of images and multimedia files associated with web pages, removing queries from search engines, etc.) and the setting up of visitor sessions, knowing that a session is a sequence of pages viewed by the same user.
clustering , K-means , web usage mining , server log file