Skip to content

Introduction to NLP and LLMs

Course Description

This course introduces the fundamental concepts of Natural Language Processing (NLP) and advanced language model technologies. Students will learn through hands-on practice, starting from basic text processing to advanced language model API utilization and NLP application development. The course emphasizes the use of Large Language Models (LLMs) and prompt engineering, aiming to develop practical skills in applying cutting-edge NLP technologies.

Learning Objectives

  1. Understand the basic concepts and key technologies of NLP and language models.
  2. Practice core NLP techniques such as text preprocessing, word embeddings, and transformer architecture.
  3. Learn methods to perform various NLP tasks using LLM APIs.
  4. Master prompt engineering techniques and apply them to solve real-world problems.
  5. Develop skills to design and implement NLP-based web applications.
  6. Understand the ethical aspects of LLM utilization and learn methods for developing safe AI systems.

Course Outline

Week 1

  • Overview: Introduction to Natural Language Processing and Language Models
  • Key Learning Content: Basic concepts of NLP, application areas, introduction to major tasks
  • Note: Lecture, Discussion on NLP application cases

Week 2

  • Overview: Basics of Text Preprocessing
  • Key Learning Content: Tokenization, normalization, stop word removal
  • Note: Lecture, Practice (Text preprocessing using NLTK library)

Week 3

  • Overview: Fundamentals of Language Models
  • Key Learning Content: N-gram models, statistical language models
  • Note: Lecture, Practice (Implementation of simple N-gram models)

Week 4

  • Overview: Word Embeddings
  • Key Learning Content: Word2Vec, GloVe, FastText
  • Note: Lecture, Practice (Creating and visualizing word embeddings using Gensim)

Week 5

  • Overview: Introduction to Transformer Architecture
  • Key Learning Content: Attention mechanism, transformer structure
  • Note: Lecture, Analysis of transformer model structure

Week 6

  • Overview: Understanding LLM APIs
  • Key Learning Content: OpenAI API usage, tokenization, sampling methods
  • Note: Lecture, Practice (Simple text generation through API calls)

Week 7

  • Overview: Basics of Prompt Engineering
  • Key Learning Content: Zero-shot, few-shot prompting, chain-of-thought technique
  • Note: Lecture, Practice (Applying various prompting techniques)

Week 8

  • Overview: Midterm Project Presentation
  • Key Learning Content: Development of NLP app prototype using content from weeks 1-7
  • Note: Student project presentations and feedback

Week 9

  • Overview: Text Classification
  • Key Learning Content: Sentiment analysis, topic classification, fine-tuning techniques
  • Note: Lecture, Practice (Implementing text classification model using BERT)

Week 10

  • Overview: Building LLM-based Q&A Systems
  • Key Learning Content: Introduction to vector databases, document parsing
  • Note: Lecture, Practice (Implementing a simple Q&A system)

Week 11

  • Overview: Basics of Web Application Development
  • Key Learning Content: Introduction to Flask/Streamlit, basic web app structure
  • Note: Lecture, Practice (Creating a web app prototype using LLM API)

Week 12

  • Overview: Controlling and Structuring LLM Outputs
  • Key Learning Content: Adjusting temperature, utilizing top_p, JSON output
  • Note: Lecture, Practice (Building an app for structured data extraction)

Week 13

  • Overview: Introduction to RAG (Retrieval-Augmented Generation)
  • Key Learning Content: RAG architecture, basics of vector search
  • Note: Lecture, Practice (Implementing a simple RAG system)

Week 14

  • Overview: Ethics and Safety in LLM Applications
  • Key Learning Content: Bias detection, content filtering, preventing prompt injection
  • Note: Lecture, Discussion (Ethical considerations in LLM usage)

Week 15

  • Overview: Final Project Presentation and Course Wrap-up
  • Key Learning Content: Sharing results of NLP app development projects
  • Note: Student project presentations, feedback, discussion on future learning directions

Evaluation

  1. Attendance and Participation (10%)
  2. Weekly Practical Assignments (30%)
  3. Midterm Project (25%)
  4. Final Project (35%)

Course Materials

Prerequisites

  • Basic Python Programming
  • Fundamentals of Statistics and Linear Algebra

Additional Notes

  • Personal laptop required as the course is practice-oriented
  • Course content may be partially modified to reflect the latest technology trends