Hi there,
I am doing a project on web usage mining of my universities server logs and im just wondering how i go about mining them in sql server 2005?
Do i mine them in one table? do i normalise the web log data? what algorithms will i use on them as im trying to get usage patterns from the users and also where most of the users come from.
Thanks in advance
Gary
Hi
Here’re some thoughts that might help you design your Data Mining for weblogs:
1. You can put the data in one or more tables as per the semantics of the data. Data which is an entity by itself should be put in a flat table with one key per entry (user sessions on the web server for example), whereas some data naturally will have a many to one relation with the primary data (page visits per session for example) and can be put in a separate table with primary/foreign key relationship. SQL Server 2005 will model them as case table/nested table for the purpose of mining.
2. Normalize: Depends on what your data looks like. If you want to run clustering algorithm and your data has two attributes, A and B and A is 10 times more important than B, you should normalize accordingly. If the actual values of A are in order of thousand and actual values ob B are in order of tens and they are equally important, you should again normalize. However, if you want to use the as-is value without a weight, you do not have to normalize the data.
3. Usage Patterns, like a sequence of page visits can be modeled using sequence clustering algorithm. User categorization based on attributes might be a clustering algorithm. If you have more information on what you want to find out, I might be able to suggest more specific choices.
Hope this helps
Shuvro
没有评论:
发表评论