Machine Learning Systems Design¶
Course Description¶
This course provides an in-depth exploration of designing and implementing machine learning systems, with a focus on the practical considerations of deploying and maintaining machine learning models in production. Students will learn how to design, build, and deploy machine learning models, and how to integrate machine learning into an organization's broader software development process. The course will cover best practices in DevOps and MLOps, including dot files, git, docker, kubernetes, CI/CD, model selection, development, training, evaluation, deployment, monitoring, and continual learning. The course will also cover training data management, experiment tracking, and model versioning with tools such as Weights & Biases.
Learning Objectives¶
Upon completing this course, students will be able to:
- Design, build, and deploy machine learning systems in a production environment.
- Understand the challenges of integrating machine learning into the broader software development process.
- Develop strong skills in DevOps and MLOps, including dot files, git, docker, kubernetes, CI/CD, model selection, development, training, evaluation, deployment, monitoring, and continual learning.
- Effectively manage training data for machine learning models.
- Use Weights & Biases for experiment tracking and model versioning.
- Deploy machine learning applications on cloud servers.
Prerequisites¶
- Familiarity with Python and machine learning concepts
Course Outline¶
The following is a tentative outline of the topics to be covered in this course. The order of the topics may be adjusted as needed.
Weeks 1-5: DevOps for Machine Learning¶
- Overview of machine learning system design, and the challenges of building and deploying machine learning models in production.
- Setting up development and deployment environments with dot files, docker, and Kubernetes.
- Best practices for version control, collaboration, and reproducibility in machine learning projects.
- Setting up a CI/CD pipeline for machine learning.
- Security considerations for machine learning models.
- Techniques for securing machine learning models in a production environment.
Weeks 6-8: MLOps: Model Selection, Development, and Deployment¶
- Best practices for model selection, development, and training, including hyperparameter tuning and cross-validation.
- Building reproducible machine learning pipelines with tools such as dvc.
- Best practices for deploying machine learning models in a production environment, including containerization and serverless computing.
- Deploying machine learning models with docker and kubernetes.
Weeks 9-10: MLOps: Model Evaluation and Monitoring¶
- Best practices for evaluating machine learning models, including metrics and performance analysis.
- Building machine learning monitoring dashboards with tools such as Grafana.
- Best practices for monitoring machine learning models in production, and techniques for continual learning and adaptation.
Weeks 11-12: MLOps: Training Data Management¶
- Best practices for managing training data for machine learning models.
- Data labeling and annotation techniques for training data.
- Techniques for data versioning and reproducibility.
Weeks 13-14: MLOps: Experiment Tracking and Model Versioning with Weights & Biases¶
- Introduction to Weights & Biases for experiment tracking and model versioning.
- Best practices for using Weights & Biases to track experiments, visualize results, and version models.
- Case studies and guest lectures on real-world machine learning system design and implementation, including success stories and common pitfalls.
Weeks 15-16: Final Project¶
- Students will build a toy machine learning application, and deploy it on a cloud server.
Grading¶
Grading will be based on the following:
- Class participation: 10%
- Assignments: 30%
- Final project: 60%
Textbook¶
There is no required textbook for this course. All required readings will be provided in class.
Resources¶
- Git: https://git-scm.com/
- Docker: https://www.docker.com/
- Kubernetes: https://kubernetes.io/
- Weights & Biases: https://www.wandb.com/
- dvc: https://dvc.org/
- Grafana: https://grafana.com/
- Google Cloud Platform: https://cloud.google.com/