CO - 1: Understand the fundamental concepts, historical development, linguistic essentials, and text pre-processing techniques in NLP.
CO - 2: Apply various machine learning algorithms and text representations to perform sentiment analysis and text classification tasks.
CO - 3: Explore deep learning models and Transformer-based architectures to tackle complex NLP tasks, such as sentiment analysis and language generation
CO - 4: Develop practical skills in building information retrieval systems, question-answering models, and text summarization while considering ethical implications in NLP.
MODULE 1: Foundations of Natural Language Processing - Introduction to NLP: Definition, Scope, and Historical Background, Linguistic Essentials for NLP: Phonetics, Phonology, Morphology, Syntax, Semantics, and Pragmatics, Text Pre-processing: Tokenization, Stemming, and Lemmatization, Stop word removal, Part-of-speech tagging, Named Entity Recognition (NER), Language Modeling: N-grams, Hidden Markov, Models (HMM), Introduction to neural language models.
MODULE 2: Natural Language Processing Techniques - Machine Learning for NLP: Supervised, unsupervised, and semi-supervised learning in NLP, Feature extraction for text data, Sentiment analysis as a classification problem, Text Classification: Naive Bayes Classifier, Support Vector Machines, Neural Network for Text Classification (CNN, RNN), Language Understanding: Introduction to Word Embeddings (Word2Vec and Glove), Distributional Semantics and Word Similarity, Text Representation using TF-IDF, Sequence-to-Sequence Models, Attention Mechanisms, Applications
MODULE 3: Advanced NLP Topics - Deep Learning for NLP: Transformer-based models (BERT, GPT, XLNet), Fine-tuning pre-trained models, Sentiment Analysis and Emotion Recognition: Aspect-based Sentiment Analysis, Detecting emotions from text using deep learning, Named Entity Recognition and Entity Linking, Entity Linking with knowledge bases, Natural Language Generation: Text Generation with Recurrent Neural Networks, Introduction to Generative Adversarial Networks (GANs) for text.
MODULE 4: NLP Applications and Ethics - Information retrieval models: Boolean Retrieval, Vector Space Models, Evaluation Metrics, Question Answering Systems: QA pipelines, Reading comprehension with attention-based models, Text Summarization: Extractive vs. Abstractive Summarization, Sequence-to-Sequence models for summarization, Ethics, and Bias in NLP: Addressing bias in language models, Ethical considerations in NLP applications, Responsible use of NLP in society
Knowledge on Python programming is a strong prerequisite for this course. Also, it is nice to have basic math and statistics skills as well.
Knowledge of using Git, Jupyter Lab, and/or Google Colab is highly appreciated but not mandatory.
It would be really nice if the students are familiar with object-oriented programming, simple data structures such as hash maps, and text processing.
Speech and Language Processing by Daniel Jurafsky and James H. Martin
Natural Language Processing with Python by Steven Bird, Ewan Klein, and Edward Loper
Natural Language Processing in Action by Lane, Howard, and Hapke
Deep Learning for Natural Language Processing by Palash Goyal, Sumit Pandey, and Karan Jain
Practical Natural Language Processing by Sowmya Vajjala, Bodhisattwa Majumder, Anuj Gupta, and Harshit Surana
Please note that the presentation materials will not be shared with the learners. The learners are requested to prepare lecture notes and also make use of the text and reference books available at the University library. Supplementary materials related to the course will be shared with the learners from time-to-time, through GitHub only.
The exercises, codes, supplementary materials, and relevant papers will be made available at this GitHub Repository.
The learners are requested to clone the repository and create a local copy of their working directory
All the learners should maintain a repository of all the coding exercises and assignments in their GitHub profile which will be evaluated by the instructor from time-to-time.
There will not be any written assignments as part of this course. There will be regular quizzes conducted (both surprise and planned quizzes) during the course. Since this course uses project-based learning strategy, there will be group activities such as interim and final project presentation. Also, there will be individual viva planned to assess the progress of each learner. The instructor may decide appropriate assignments and evaluation methodologies from time-to-time.
The video lectures related to the course may be added to the Instructor's YouTube Channel available HERE. You may subscribe this channel for receiving notification when new videos are uploaded.
The instructor may share other publicly available relevant videos with the learners as and when required to supplement the learning process.