The course aims at a perfect balance between theory and practice. Therefore, prerequisites include:
- Software requirements
Basic skills are required: writing loops, functions, classes, etc. Passing some interactive tutorials like DataQuest, DataCamp or even CodeAcademy will suffice. However, a deeper dive into Python is only appreciated, there will be some tasks where you have to implement an ML algorithm from scratch.
Knowledge of basic concepts from calculus, linear algebra, probability theory, and statistics is also required. If you need to catch up, a good resource will be Part I of the “Deep Learning” book or “Mathematics for Machine Learning”. For a deeper dive take a look at MIT courses.
We have prepared a Docker image with all software required to run lecture notebooks and assignments. We validate our notebooks against the docker image we use. We synchronize package versions with the Kaggle Docker image and freeze it just before the course starts.
You may want to use this image (recommended). Otherwise, Anaconda 3 distribution is the best option (it contains the latest Python with
Jupyter, and lots of other libraries). However, some other packages are also used –
Vowpal Wabbit to name a few. In addition, the
Graphviz library must be installed. Installing some of them on Windows might be painful.
To summarize, you’ve got several alternatives to set up your learning environment:
- Kaggle Kernels & Azure ML
- Pip & Anaconda
Kaggle Kernels & Azure ML
The easiest way to start working with course materials (no local software installation) is to visit Kaggle Dataset mlcourse.ai and fork some Kernels (please keep them private). All your Jupyter notebooks with Anaconda are live and running in your browser. Almost all needed datasets are there as well. However, uploading other datasets might be tiresome.
Pip & Anaconda
Most python packages like
Sklearn can be installed manually with
pip – python installer. However, the preferred option is to use Anaconda. Additionally, you’ll need
Vowpal Wabbit, and (maybe)
CatBoost for competitions.
All necessary software is already installed and distributed in the form of a Docker container. Instructions:
Docker on Linux and macOS
- install Docker
- add your user to the docker group:
sudo usermod -aG docker your_user_name
- install git using your OS package manager
- clone and download the mlcourse.ai repository
- cd in the terminal into
bash run_docker_jupyter.sh. The first time it might take 5-10 minutes for image downloading
- aim your browser to
localhost:4545. You should see files from the mlcourse.ai folder
- To test your setup, click on
check_docker.ipynband execute all cells to make sure all the libraries are installed and work fine.
Docker on Windows
If you meet the following requirements, install Docker for Windows
- Windows 10 64bit: Pro, Enterprise, or Education (1607 Anniversary Update, Build 14393 or later).
- Virtualization is enabled in BIOS. Typically, virtualization is enabled by default. This is different from having Hyper-V enabled. For more detail see Virtualization must be enabled in Troubleshooting.
- At least 4GB of RAM.
It’s not the end of the world if you can’t meet these requirements. You can still use Docker Toolbox which is a good official alternative and with fewer requirements with to the Windows version. There are slight differences between Docker and Docker Toolbox for the end-user, but you can safely use both for now.
When you run the installer, it may offer you to install git along. Mark a checkbox with this option if you don’t have git on your system.
In the case of Docker Toolbox, you may or may not need to delete your existing Virtualbox installation.
Once the installation is complete, open docker (in case of docker toolbox open Docker CLI, it’s called Docker Quickstart Terminal) and type:
> docker run hello-world. It should run without errors.
Open a Command-line terminal and clone the course repo:
git clone https://github.com/Yorko/mlcourse.ai
Warning for Docker Toolbox users: you must put your repo in your home dir, i.e.
C:\Users\%username%\mlcourse.ai, otherwise the
run_docker_jupyter_windows.cmd won’t work. There is a workaround in case of a different location, but we don’t assist with it.
Change to mlcourse.ai directory:
cd mlcourse.ai and run
run_docker_jupyter_windows.cmd. Take a note on the local address the notebook reports, and aim your browser to this address. In the case of Windows 10 and Hyper-V, it should just be
http://localhost:4545. In the case of Docker Toolbox, it’s different. We implemented the autostart of your default browser with the correct address, but beware, that it may not work in Internet Explorer or Edge (for unknown reason). Use Firefox or Chrome then.
In the browser, you should see the directory tree from your mlcourse.ai folder. Click on
check_docker.ipynb and execute all cells to make sure all the libraries are installed and work fine.
- Typically, Docker containers need a lot of disk space. The official mlcourse image requires some 6Gb of space.
docker pullto get new files from the repo to your locally downloaded repo.
- when you work with an assignment notebook, duplicate it first, and work with the duplicate. This way it’s easier to pull changes to the repo since there will be no conflicts on the file level. If you’d like to work on a lecture notebook, do the same.
- You can install additional packages right in the Jupyter notebook with
pip install --user your_new_package. They will be installed in mlcourse.ai/home folder and will persist across Jupyter restart.
- optionally, you can modify the docker_files/Dockerfile file, build a new image locally with
docker build -t <tag_name>) and run
run_docker_jupyter.sh <tag_name>(only supported under Linux/MacOS).
- Docker documentation is full of concise and clear examples.
Few useful commands:
docker ps– list all containers
docker stop $(docker ps -a -q)– stop all containers
docker rm $(docker ps -a -q)– remove all containers
docker images- list all docker images
docker rmi <image_id>– remove a docker image
Regardless of the environment (pip, Kaggle Kernels/Azure, or Docker), you’ll work with Jupyter notebooks. If new to this, take a look at jupyter.org. In a nutshell, this is a way of mixing code, graphics, markdown, latex, etc. in a single development environment. Perfect for sharing your work/ideas, for prototyping and for working with educative materials.
To start working with course materials (i.e. Jupyter notebooks), download/clone this) repo and run
jupyter-notebook from the downloaded directory mlcourse.ai.
Apart from installing the environment, it’s highly recommended that you familiarize yourself with GitHub and bash. And Docker, of course, if you choose this way of setting your environment.