Introduction to NLP and LLMs¶

Course Description¶

This course introduces the fundamental concepts of Natural Language Processing (NLP) and advanced language model technologies. Students will learn through hands-on practice, starting from basic text processing to advanced language model API utilization and NLP application development. The course emphasizes the use of Large Language Models (LLMs) and prompt engineering, aiming to develop practical skills in applying cutting-edge NLP technologies.

Learning Objectives¶

Understand the basic concepts and key technologies of NLP and language models.
Practice core NLP techniques such as text preprocessing, word embeddings, and transformer architecture.
Learn methods to perform various NLP tasks using LLM APIs.
Master prompt engineering techniques and apply them to solve real-world problems.
Develop skills to design and implement NLP-based web applications.
Understand the ethical aspects of LLM utilization and learn methods for developing safe AI systems.

Course Outline¶

Week 1

Overview: Introduction to Natural Language Processing and Language Models
Key Learning Content: Basic concepts of NLP, application areas, introduction to major tasks
Note: Lecture, Discussion on NLP application cases

Week 2

Overview: Basics of Text Preprocessing
Key Learning Content: Tokenization, normalization, stop word removal
Note: Lecture, Practice (Text preprocessing using NLTK library)

Week 3

Overview: Fundamentals of Language Models
Key Learning Content: N-gram models, statistical language models
Note: Lecture, Practice (Implementation of simple N-gram models)

Week 4

Overview: Word Embeddings
Key Learning Content: Word2Vec, GloVe, FastText
Note: Lecture, Practice (Creating and visualizing word embeddings using Gensim)

Week 5

Overview: Introduction to Transformer Architecture
Key Learning Content: Attention mechanism, transformer structure
Note: Lecture, Analysis of transformer model structure

Week 6

Overview: Understanding LLM APIs
Key Learning Content: OpenAI API usage, tokenization, sampling methods
Note: Lecture, Practice (Simple text generation through API calls)

Week 7

Overview: Basics of Prompt Engineering
Key Learning Content: Zero-shot, few-shot prompting, chain-of-thought technique
Note: Lecture, Practice (Applying various prompting techniques)

Week 8

Overview: Midterm Project Presentation
Key Learning Content: Development of NLP app prototype using content from weeks 1-7
Note: Student project presentations and feedback

Week 9

Overview: Text Classification
Key Learning Content: Sentiment analysis, topic classification, fine-tuning techniques
Note: Lecture, Practice (Implementing text classification model using BERT)

Week 10

Overview: Building LLM-based Q&A Systems
Key Learning Content: Introduction to vector databases, document parsing
Note: Lecture, Practice (Implementing a simple Q&A system)

Week 11

Overview: Basics of Web Application Development
Key Learning Content: Introduction to Flask/Streamlit, basic web app structure
Note: Lecture, Practice (Creating a web app prototype using LLM API)

Week 12

Overview: Controlling and Structuring LLM Outputs
Key Learning Content: Adjusting temperature, utilizing top_p, JSON output
Note: Lecture, Practice (Building an app for structured data extraction)

Week 13

Overview: Introduction to RAG (Retrieval-Augmented Generation)
Key Learning Content: RAG architecture, basics of vector search
Note: Lecture, Practice (Implementing a simple RAG system)

Week 14

Overview: Ethics and Safety in LLM Applications
Key Learning Content: Bias detection, content filtering, preventing prompt injection
Note: Lecture, Discussion (Ethical considerations in LLM usage)

Week 15

Overview: Final Project Presentation and Course Wrap-up
Key Learning Content: Sharing results of NLP app development projects
Note: Student project presentations, feedback, discussion on future learning directions

Evaluation¶

Attendance and Participation (10%)
Weekly Practical Assignments (30%)
Midterm Project (25%)
Final Project (35%)

Course Materials¶

Lecture Note: https://nlp2024.halla.ai
GitHub: https://github.com/entelecheia/intronlp-2024
OpenAI API documentation, Hugging Face documentation, latest NLP-related papers and blog posts

Prerequisites¶

Basic Python Programming
Fundamentals of Statistics and Linear Algebra

Additional Notes¶

Personal laptop required as the course is practice-oriented
Course content may be partially modified to reflect the latest technology trends