Chen Li


Machine Learning Notes: Workflow & Tips

(Please refer to Wow It Fits! — Secondhand Machine Learning.)

Here are my notes from Zero to Mastery Learn PyTorch for Deep Learning. For cheatsheet, see PyTorch Cheatsheet - Zero to Mastery Learn PyTorch for Deep Learning or Create a training/testing loop or PyTorch documentation.

§1 Workflow

Most of the time it’s necessary to subclass the classes mentioned above, check PyTorch documentation.

§2 Tensor Error

See The Three Most Common Errors in PyTorch - Zero to Mastery Learn PyTorch for Deep Learning.

  1. Shape, e.g. [H, W, C] (usually used in numpy or matplotlib.pyplot) or [C, H, W] (usually used in torch) or [batch_size, C, H, W].

  1. Device

  2. Type

§3 Jupyter Notebook

  • markdown.
  • markdown titles.
  • print the output of a function.

§4 torchinfo & torch.utils.tensorboard

To use torchinfo to check the structure of a model:

from torchinfo import summary

summary(model=VisionCNN)

# or

summary(model=VisionCNN,
        input_size=(32, 3, 224, 224), # (batch_size, C, H, W)
        col_names=["input_size", "output_size", "num_params"],
        col_width=20,# column width
        row_settings=["var_names"])

To use torch.nn.tensorboard, see 07. PyTorch Experiment Tracking - Zero to Mastery Learn PyTorch for Deep Learning. Here’s an example of train function, the log of which is like a simple version of torch.nn.tensorboard:

from typing import Dict, List
from tqdm.auto import tqdm

# Add writer parameter to train()
def train(model: torch.nn.Module, 
          train_dataloader: torch.utils.data.DataLoader, 
          test_dataloader: torch.utils.data.DataLoader, 
          optimizer: torch.optim.Optimizer,
          loss_fn: torch.nn.Module,
          epochs: int,
          device: torch.device, 
          writer: torch.utils.tensorboard.writer.SummaryWriter # new parameter to take in a writer
          ) -> Dict[str, List]:
    """Trains and tests a PyTorch model.

    Passes a target PyTorch models through train_step() and test_step()
    functions for a number of epochs, training and testing the model
    in the same epoch loop.

    Calculates, prints and stores evaluation metrics throughout.

    Stores metrics to specified writer log_dir if present.

    Args:
      model: A PyTorch model to be trained and tested.
      train_dataloader: A DataLoader instance for the model to be trained on.
      test_dataloader: A DataLoader instance for the model to be tested on.
      optimizer: A PyTorch optimizer to help minimize the loss function.
      loss_fn: A PyTorch loss function to calculate loss on both datasets.
      epochs: An integer indicating how many epochs to train for.
      device: A target device to compute on (e.g. "cuda" or "cpu").
      writer: A SummaryWriter() instance to log model results to.

    Returns:
      A dictionary of training and testing loss as well as training and
      testing accuracy metrics. Each metric has a value in a list for 
      each epoch.
      In the form: {train_loss: [...],
                train_acc: [...],
                test_loss: [...],
                test_acc: [...]} 
      For example if training for epochs=2: 
              {train_loss: [2.0616, 1.0537],
                train_acc: [0.3945, 0.3945],
                test_loss: [1.2641, 1.5706],
                test_acc: [0.3400, 0.2973]} 
    """
    # Create empty results dictionary
    results = {"train_loss": [],
               "train_acc": [],
               "test_loss": [],
               "test_acc": []
    }

    # Loop through training and testing steps for a number of epochs
    for epoch in tqdm(range(epochs)):
        train_loss, train_acc = train_step(model=model,
                                          dataloader=train_dataloader,
                                          loss_fn=loss_fn,
                                          optimizer=optimizer,
                                          device=device)
        test_loss, test_acc = test_step(model=model,
          dataloader=test_dataloader,
          loss_fn=loss_fn,
          device=device)

        # Print out what's happening
        print(
          f"Epoch: {epoch+1} | "
          f"train_loss: {train_loss:.4f} | "
          f"train_acc: {train_acc:.4f} | "
          f"test_loss: {test_loss:.4f} | "
          f"test_acc: {test_acc:.4f}"
        )

        # Update results dictionary
        results["train_loss"].append(train_loss)
        results["train_acc"].append(train_acc)
        results["test_loss"].append(test_loss)
        results["test_acc"].append(test_acc)


        ### New: Use the writer parameter to track experiments ###
        # See if there's a writer, if so, log to it
        if writer:
            # Add results to SummaryWriter
            writer.add_scalars(main_tag="Loss", 
                               tag_scalar_dict={"train_loss": train_loss,
                                                "test_loss": test_loss},
                               global_step=epoch)
            writer.add_scalars(main_tag="Accuracy", 
                               tag_scalar_dict={"train_acc": train_acc,
                                                "test_acc": test_acc}, 
                               global_step=epoch)

            # Close the writer
            writer.close()
        else:
            pass
    ### End new ###

    # Return the filled results at the end of the epochs
    return results

§5 More

In terms of “vibe”, here’s what I find interesting in the course that I’ll try to take to my future communication and teaching (if possible):

  • It’s like a cooking show, try to keep it fun and interactive. The way to learn Machine Learning is to do different projects, and you will get better each time.
  • Bugs are fun and valuable, deal with them with positivity. When facing a problem while programming, use search engines, Wikipedia and official documents (in this case PyTorch documentation).