With the rapid growth of the Web, finding desirable information on the Internet is a tedious and time consuming task. Focused crawlers are the golden keys to solve this issue through mining of the Web content. In this regard, a variety of methods have been devised and implemented. In this book, we list and categorize these focused crawlers’ methods into different classes by stating cons and pro of each one. Many of these methods, from information retrieval viewpoint, are not biased towards more informative terms in multi-term topics. In this research book also by considering information contents of terms, we propose our Term Frequency-Information Content (TF-IC) method which assigns appropriate weight to each term in a multi-term topic. We show TF-IC outperforms other methods such as Term Frequency-Inverse Document Frequency (TF-IDF) and Latent Semantic Indexing (LSI).
Similarity function is the key to accuracy of collaborative filtering algorithms. Adding a time factor to it addresses the problem of handling the web data efficiently as it is highly dynamic in nature. The data used in collaborative filtering algorithms is collected over as long period of time, in the form of feedbacks, clicks, etc. The interest of user or popularity of an item tends to change as new seasons, moods or festivals. The similarity function with temporal factor can efficiently handle the dynamics of web data as it captures and assigns weightage to the data. More recent data is given more weightage when similarity is calculated. in this way, the recent trends and older and obsolete data values are discarded when new unobserved items are predicted using collaborative filtering algorithms. Hence, better results and more accuracy.