Machine Learning Systems Design¶
Course Description¶
This course delves into the practical aspects of designing, implementing, and managing machine learning systems within production environments. It aims to equip students with the necessary skills to integrate machine learning workflows seamlessly with existing software development and operational processes, emphasizing DevOps, MLOps, and security practices. Through hands-on projects, students will engage with cutting-edge tools and methodologies such as dot files, git, Docker, Kubernetes, CI/CD pipelines, Weights & Biases, and more, to build, deploy, and maintain robust machine learning models. Additionally, the course will introduce students to the emerging fields of GitOps, DevSecOps, and LLMOps, preparing them for the complexities of real-world machine learning applications.
Learning Objectives¶
Upon successful completion of this course, students will be able to:
- Design, build, and deploy advanced machine learning systems within production settings, ensuring scalability, efficiency, and security.
- Navigate the integration of machine learning projects within broader software development and IT operations (DevOps and MLOps).
- Master the use of essential tools and practices for continuous integration/continuous deployment (CI/CD), experiment tracking, model versioning, and security in machine learning workflows.
- Implement secure, reproducible, and efficient machine learning pipelines, leveraging containerization, serverless architectures, and cloud computing.
Prerequisites¶
- Proficiency in Python programming and fundamental machine learning concepts.
- Basic understanding of software development practices and tools.
Course Outline¶
Week 1: Introduction to Machine Learning System Design¶
Outline:
- Overview of machine learning system architecture.
- Challenges in deploying machine learning models in production.
Key Learning Outcomes:
- Understand the components and architecture of machine learning systems.
- Identify challenges in deploying models in production environments.
Week 2: Introduction to MLOps and DevOps¶
Outline:
- Fundamentals of MLOps and its importance.
- DevOps principles applied to machine learning.
Key Learning Outcomes:
- Grasp the significance of MLOps in the machine learning lifecycle.
- Apply DevOps principles to machine learning projects.
Week 3: Git and GitOps for Machine Learning¶
Outline:
- Best practices for version control with Git.
- Introduction to GitOps and its application in machine learning.
Key Learning Outcomes:
- Master version control with Git in the context of machine learning.
- Understand the application of GitOps in machine learning workflows.
Week 4: Security Practices in Machine Learning (DevSecOps)¶
Outline:
- Security considerations for machine learning models.
- Introduction to DevSecOps in the machine learning context.
Key Learning Outcomes:
- Identify security considerations specific to machine learning.
- Understand the principles of DevSecOps applied to machine learning.
Week 5: Efficient Environment Management with Dotfiles and Dotdrop¶
Outline:
- Managing development environments with dot files.
- Setting up Dotdrop for dotfiles management.
Key Learning Outcomes:
- Efficiently manage development environments using dot files.
- Implement Dotdrop for streamlined dotfiles management.
Week 6: Secure Setup for Development (SSH, GPG, AGE)¶
Outline:
- Security protocols for machine learning systems.
- Practical setup of SSH, GPG, and AGE for secure development.
Key Learning Outcomes:
- Implement secure communication protocols in machine learning projects.
- Setup SSH, GPG, and AGE for development security.
Week 7: Advanced Version Control and Project Management¶
Outline:
- GitHub setup and workflow optimization.
- Using SOPS, Pass, and Passage for secure secrets management.
Key Learning Outcomes:
- Optimize GitHub workflows for machine learning projects.
- Manage secrets securely in machine learning projects.
Week 8: Continuous Integration and Continuous Deployment (CI/CD) for ML¶
Outline:
- Setting up CI/CD pipelines in machine learning projects.
- Dockerfiles and container management for machine learning.
Key Learning Outcomes:
- Establish CI/CD pipelines for machine learning projects.
- Utilize Dockerfiles and containers for machine learning deployment.
Week 9: Containerization and Orchestration¶
Outline:
- Deep dive into Docker and containerd for ML applications.
- Kubernetes for orchestrating machine learning models.
Key Learning Outcomes:
- Master containerization techniques for machine learning applications.
- Implement Kubernetes for machine learning model orchestration.
Week 10: Building Reproducible ML Pipelines¶
Outline:
- Introduction to DVC and best practices for data versioning.
- Experiment tracking and model versioning with Weights & Biases.
Key Learning Outcomes:
- Implement DVC for data and model versioning.
- Use Weights & Biases for experiment tracking and model versioning.
Week 11: Monitoring and Continual Learning in Production¶
Outline:
- Implementing monitoring solutions with Grafana.
- Strategies for continual learning and model updating.
Key Learning Outcomes:
- Deploy monitoring solutions for machine learning models.
- Apply strategies for continuous learning and model improvement.
Week 12: Data Management and Experimentation¶
Outline:
- Techniques for efficient training data management and annotation.
- Advanced experiment tracking with Weights & Biases.
Key Learning Outcomes:
- Implement effective data management and annotation techniques.
- Advanced utilization of Weights & Biases for experiment tracking.
Week 13: Serverless Machine Learning and LLMOps¶
Outline:
- Introduction to serverless computing for ML deployments.
- Overview of LLMOps for deploying lightweight models.
Key Learning Outcomes:
- Understand serverless computing in the context of machine learning.
- Grasp the concept of LLMOps for efficient model deployments.
Week 14: Real-world Applications and Case Studies¶
Outline:
- Deploying a voice-based chatbot with BentoML, LangChain, and Gradio.
- Guest lectures on MLOps challenges and solutions.
Key Learning Outcomes:
- Apply course concepts to deploy real-world machine learning applications.
- Learn from industry experts about practical MLOps challenges and solutions.
Week 15: Final Project¶
Outline:
- Development and deployment of a machine learning application on a cloud server, incorporating course learnings.
Key Learning Outcomes:
- Demonstrate the ability to design, build, and deploy a machine learning system.
- Integrate MLOps, DevOps, and security practices into a real-world project.
Grading¶
- Class participation: 10%
- Assignments: 30%
- Term project: 60%
Textbook and Resources¶
No required textbook. Resources will be provided, including access to:
- Lecture notes from https://lecture.halla.ai/lectures/mlops/index.html.