ABOUT THE COURSE
Modern internet search engines form the primary interface for most users interacting with the World Wide Web. The dramatic increase in data available on the Web in recent years means that automatic methods of Information Retrieval (IR) have acquired greater significance. For the purpose of this course, IR will mainly mean the study of the indexing, processing, storage, and querying of textual data. The course aims to introduce the core principles and techniques used in IR and demonstrate how statistical language models can be used to solve document indexing and retrieval problems.
COURSE OUTCOMES
CO - 1: Understand and describe the principles and components of information retrieval systems.
CO - 2: Apply and analyze core algorithms, term weighting, and probabilistic models in information retrieval systems.
CO - 3: Evaluate and synthesize advanced techniques like Latent Semantic Indexing, web search algorithms, and cross-language retrieval within information retrieval systems.
CO - 4: Create and evaluate information retrieval systems using test collections, user-centered approaches, and ethical considerations.
COURSE MODULES
MODULE 1: Introduction to Information Retrieval - Information Retrieval: Definition and Importance, Real-world Applications, Challenges of Handling Large Volume of Information, Need for Efficient Retrieval Techniques, Retrieval Models: Boolean, Vector Space and Probabilistic Models, Query Processing and Document Indexing, Evaluation Matrices in Information Retrieval: Precision, Recall, F1-Score, Mean Average Precision
MODULE 2: Information Retrieval Algorithms - Inverted Index Construction and Compression, Term Weighting: TF-IDF, Vector Space Model and Cosine Similarity, Probabilistic Retrieval Models: Okapi BM25, Handling Queries with Multiple Terms, Relevance Feedback and Query Expansion.
MODULE 3: Advanced Information Retrieval Techniques - Latent Semantic Indexing (LSI) and Singular Value Decomposition (SVD), Web Search and Link Analysis: PageRank, HITS algorithm, Machine Learning for Information Retrieval, Cross-language Information Retrieval, Handling Multimedia Content in Retrieval Systems.
MODULE 4: Evaluation and Optimization of Information Retrieval Systems - Test Collections and Evaluation Methodologies, Information Retrieval Evaluation Metrics, Performance Optimization Techniques, User-centered Evaluation and User Studies, Ethical Considerations in Information Retrieval
COURSE PREREQUISITES
All students should have knowledge on Calculus, Linear Algebra, Probability & Statistics, and should have the ability to code in Python.
TEXTBOOKS
"Introduction to Information Retrieval" by Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze
"Modern Information Retrieval" by Ricardo Baeza-Yates and Berthier Ribeiro-Neto
"Information Retrieval: Algorithms and Data Structures" by William B. Frakes and Ricardo
"Information Retrieval: Implementing and Evaluating Search Engines" by Stefan Büttcher, Charles L. A. Clarke, and Gordon V. Cormack
LEARNING MATERIALS
Module - 1: Introduction to Information Retrieval
Module - 2: IR: Indexing, Storage, and Compression
Module - 3: Retrieval Models, LSI, Efficiency Considerations, Relevance Feedback, Query Expansion
IR Models (Reading Material) - https://research.utwente.nl/files/5588097/IRModelsTutorial-draft.pdf
LAB EXERCISES
Exercise 1: Tokenization, Stemming, Lemmatization, Stopwords, Phrases
Exercise 2: Okapi BM25
ASSIGNMENTS
VIDEO LECTURES
CONTACT
If you have any questions or need further clarification about the course, please contact me via email or during my office hours. Let's embark on this exciting journey into Information Retrieval!
Looking forward to a rewarding and knowledge-filled semester!
Dr. Anoop V. S.
Course Instructor