Chen Li


Machine Learning Notes: fastai

(Please refer to Wow It Fits! — Secondhand Machine Learning.)

Compared with tensorflow, mxnet, paddle or pure numpy (just for the fun of it), torch is probably the easiest Machine Learning package, and to get it even easier, let’s take a look at fastai.

By the way, I subscribed GitHub Trending by RSS and the other day I got these two at the same time. Machine Learning in numpy is really cool, but the second one is like, why? … these two target markets do NOT overlap.

ml-in-np-python-in-excel

Back to the topic, here’s my notes from fastai, fastbook (github.com) and Practical Deep Learning for Coders. The main idea of fastai is to make Machine Learning accessible to every individual so that it can be applied to various subjects, and to do so:

  • Use Google Colab, Kaggle, Paperspace, Hugging face.
  • Only requires one GPU (or at least try to).
  • fastai starts with higher architecture (see fastai - Quick start) and digs deeper so that it’s more customizable. Nonetheless, some higher architectures are useful that you will see it in almost any project that uses fastai, for example, DataLoaders and Learners.

Tabular Data

Random Forest is baseline, sometimes even the best method. See sklearn.ensemble.RandomForestClassifier — scikit-learn 1.3.0 documentation.

Transfer Learning for CV

See The best vision models for fine-tuning | Kaggle.

Two Categories

Sometimes predicting two categories at the same time has these two advantages:

  • Parallel computing, which saves more time and computing sources.
  • The categories might help each other. For example, predicting the fish and the boat may produce better results than only predicting the fish.

Learners

from fastai.vision.all import Learners
learn = Learners(...)

See fastai - Pytorch to fastai details. Here’s some useful tricks: