{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "# Customize training\n", "[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/luigibonati/mlcolvar/blob/main/docs/notebooks/tutorials/intro_3_loss_optim.ipynb)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### Outline\n", "\n", "The CVs implemented in `mlcolvar.cvs` are subclasses of `lightning.LightningModule` which can be tought as tasks rather than just plain models. Indeed, they incorporate also the optimizer as well as the loss function used in the training step. In this tutorial you will learn how to customize the different aspects of the training behaviour:\n", "\n", "- optimizer\n", "- loss function\n", "- trainer" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### Optimizer" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "The optimizer used is returned by the function `configure_optimizers` which is called by the lightning trainer. The default optimizer is `Adam`. To change it, or to customize the optimizer's arguments, you can interact with the CV's members `optimizer_name` and `optimizer_kwargs`. \n", "\n", "For instance, this could be used to add an L2 regularization through the `weight_decay` argument." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# Colab setup\n", "import os\n", "\n", "if os.getenv(\"COLAB_RELEASE_TAG\"):\n", " import subprocess\n", " subprocess.run('wget https://raw.githubusercontent.com/luigibonati/mlcolvar/main/colab_setup.sh', shell=True)\n", " cmd = subprocess.run('bash colab_setup.sh TUTORIAL', shell=True, stdout=subprocess.PIPE)\n", " print('Done!')" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/home/etrizio@iit.local/Bin/miniconda3/envs/mlcvs_test/lib/python3.10/site-packages/tqdm/auto.py:22: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n", " from .autonotebook import tqdm as notebook_tqdm\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Optimizer: Adam\n", "Arguments: {'weigth_decay': 0.0001}\n" ] } ], "source": [ "from mlcolvar.cvs import RegressionCV\n", "\n", "# define example CV\n", "cv = RegressionCV(layers=[10,5,5,1], options={})\n", "\n", "# choose optimizer\n", "cv.optimizer_name = 'Adam' \n", "\n", "# choose arguments\n", "cv.optimizer_kwargs = {'weigth_decay' : 1e-4 }\n", "\n", "print(f'Optimizer: {cv.optimizer_name}')\n", "print(f'Arguments: {cv.optimizer_kwargs}')" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Options to the default `Adam` optimizer can also be passed using the `options` parameter of the CV model using the keyword `optimizer` in the dictionary. The provided options will be registered in `optimizer_kwargs`.\n", "\n", "For example we can set the `lr` and the `weight_decay`" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "optimizer_kwargs: {'lr': 0.002, 'weight_decay': 0.0001}\n" ] } ], "source": [ "# define optimizer options\n", "options = {'optimizer' : {'lr' : 2e-3, 'weight_decay' : 1e-4} }\n", "\n", "# define example CV\n", "cv = RegressionCV(layers=[10,5,5,1], options=options)\n", "\n", "print(f'optimizer_kwargs: {cv.optimizer_kwargs}')" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "We can also associate to the optimizer a **learning rate scheduler**, which allows to modify the learning rate of the optimizer as the optimization proceeds to facilitate the training. For example, to reduce the learning rate as a function of the epochs.\n", "\n", "To do this we can easily use the schedulers implemented in `torch.optim.lr_scheduler`.\n", "\n", "This can also be passed using the `options` parameter of the CV model using the keyword `lr_scheduler` in the dictionary. \n", "The scheduler object needs to be included under the key `scheduler`, and the parameters of the chosen scheduler should be passed under the corresponding names.\n", "\n", "When using a LR scheduler, the learning rate is saved into the metrics logged by `MetricsCallback` (as `lr`)." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "import torch \n", "# choose the scheduler\n", "lr_scheduler = torch.optim.lr_scheduler.ExponentialLR # requires gamma as parameter\n", "\n", "# define scheduler options\n", "options = {'lr_scheduler' : { 'scheduler' : lr_scheduler, 'gamma' : 0.9999} }\n", "\n", "# define example CV\n", "cv = RegressionCV(layers=[10,5,5,1], options=options)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Schedulers that require a monitored metric (e.g., `ReduceLROnPlateau`) can pass a `lr_scheduler_config`\n", "dictionary, for example: `{'monitor': 'valid_loss'}`. This dictionary is merged into the Lightning\n", "scheduler config alongside the `scheduler` object." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Example: scheduler that requires a monitored metric (ReduceLROnPlateau)\n", "from torch.optim.lr_scheduler import ReduceLROnPlateau\n", "\n", "options = {\n", " 'lr_scheduler': {\n", " 'scheduler': ReduceLROnPlateau,\n", " 'mode': 'min',\n", " 'factor': 0.5,\n", " 'patience': 5,\n", " },\n", " 'lr_scheduler_config': {\n", " 'monitor': 'valid_loss',\n", " },\n", "}" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### Loss function" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "The set of operations that is performed at each optimization step are encoded in the method `training_step` of each CV. They typically involve:\n", "1. a forward pass of the model\n", "2. the calculation of the loss function\n", "3. a backward pass\n", "\n", "The general workflow cannot be changed as it is specific to each CV, unless you subclass a given CV and overload the `training_step` method. However, there are some details that can be changed." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "For example, one might want to change the loss function in a `RegressionCV` (or in an `AutoEncoderCV`) from Mean Square Error (MSE) to Mean Absolute Error (MAE). To do so, one need to define the function with the same signature of the one used in the CV and then set it into the `loss_fn` member:" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "default: at 0x7f89c069c670>\n", "(a) new: \n", "(b) new: at 0x7f89c069c670>\n" ] } ], "source": [ "from torch import Tensor\n", "\n", "# print default loss\n", "print(f'default: {cv.loss_fn}' )\n", "\n", "# define new function\n", "def mae_loss(input : Tensor, target: Tensor):\n", " return \n", "\n", "# assign it\n", "cv.loss_fn = mae_loss\n", "\n", "print(f'(a) new: {cv.loss_fn}' )\n", "\n", "# this could also be accomplished with a lambda function\n", "cv.loss_fn = lambda x,y : (x-y).abs().sum()\n", "\n", "print(f'(b) new: {cv.loss_fn}' )" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Another setting which can be customized is the one in which the loss function has some options which can be customized. For instance, in the case of DeepLDA/DeepTICA CVs the loss function is `ReduceEigenvaluesLoss` which takes as inputs the eigenvalues of the underlying statistical problem and return a scalar (e.g. the sum of eigenvalues squared). To see the variables that can be set you should look at the documentation of the loss functions used.\n", "\n", "For example, to change the reduction mode to the sum of the eigenvalues instead of the sum of the squared ones, one can update the loss function accordingly:" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "default kwargs: {'mode': 'sum2', 'n_eig': 0}\n", ">> new kwargs: {'mode': 'sum', 'n_eig': 0}\n" ] } ], "source": [ "from mlcolvar.cvs import DeepTICA\n", "\n", "# define CV\n", "cv = DeepTICA(layers=[10, 5, 5, 2], options={})\n", "\n", "# print default loss mode\n", "print(f'default mode: {cv.loss_fn.mode}')\n", "\n", "# change the mode\n", "cv.loss_fn.mode = 'sum' \n", "\n", "# print new loss mode\n", "print(f'>> new mode: {cv.loss_fn.mode}')" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### Trainer" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Since we are using the pytorch lightning framework we can exploit all of the benefits of this library. For instance, we can decide to run the optimization of the model on the GPUs if available with no change to our code. " ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "GPU available: False, used: False\n", "TPU available: False, using: 0 TPU cores\n", "IPU available: False, using: 0 IPUs\n", "HPU available: False, using: 0 HPUs\n" ] } ], "source": [ "import lightning\n", "\n", "# choose accelerators\n", "trainer = lightning.Trainer(accelerator='cpu') #options are: \"cpu\", \"gpu\", \"tpu\", \"ipu\", \"hpu\", \"mps\", \"auto\"" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "An important class of functions that can be used to customize the behaviour during the training are **callbacks**. \n", "\n", "Quoting the lightning documentation: *[Callbacks](https://lightning.ai/docs/pytorch/latest/extensions/callbacks.html) allow you to add arbitrary self-contained programs to your training. At specific points during the flow of execution (hooks), the Callback interface allows you to design programs that encapsulate a full set of functionality. It de-couples functionality that does not need to be in the lightning module and can be shared across projects.*\n", "\n", "For instance, they can be used to perform early stopping as well as to save model checkpoints or to save metrics. Here we will just give some examples of these functionalities, while we refer the reader to lightning [documentation](https://lightning.ai/docs/pytorch/latest/extensions/callbacks.html) for a more detailed overview." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "#### Early stopping" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Early stopping allows to stop the training when a given metric (typically the validation loss) does not decrease (increase) anymore, which is a symptom of overfitting." ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "GPU available: False, used: False\n", "TPU available: False, using: 0 TPU cores\n", "IPU available: False, using: 0 IPUs\n", "HPU available: False, using: 0 HPUs\n" ] } ], "source": [ "from lightning.pytorch.callbacks.early_stopping import EarlyStopping\n", "\n", "early_stopping = EarlyStopping(monitor=\"valid_loss\", # quantity to monitor\n", " mode='min', # whether this should me minimized or maximized durining training\n", " min_delta=0, # minimum value that the quantity should change\n", " patience=10, # how many epochs to wait before stopping the training\n", " verbose=False \n", " )\n", "\n", "trainer = lightning.Trainer(callbacks=[early_stopping])" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "#### Model checkpointing" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "It is often useful to save the checkpoint of the model which perform best according to some metric. This is useful when used, for instance, with early stopping.\n", "\n", "After training finishes, you can use `best_model_path` to retrieve the path to the best checkpoint file and `best_model_score` to retrieve its score." ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "GPU available: False, used: False\n", "TPU available: False, using: 0 TPU cores\n", "IPU available: False, using: 0 IPUs\n", "HPU available: False, using: 0 HPUs\n" ] } ], "source": [ "from lightning.pytorch.callbacks.model_checkpoint import ModelCheckpoint\n", "\n", "# see documentation for additional customization, e.g. location and file names ecc..\n", "checkpoint = ModelCheckpoint(save_top_k=1, # number of models to save (top_k=1 means only the best one is stored)\n", " monitor=\"valid_loss\" # quantity to monitor\n", " )\n", "\n", "# assign callback to trainer\n", "trainer = lightning.Trainer(callbacks=[checkpoint])" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "After the training is over remember also to export the TorchScript model which is needed by PLUMED. The following code first load the best checkpoint and then compiles it." ] }, { "attachments": {}, "cell_type": "raw", "metadata": {}, "source": [ "best_model = RegressionCV.load_from_checkpoint(checkpoint.best_model_path)\n", "best_model.to_torchscript(file_path = checkpoint.best_model_path.replace(\".ckpt\",\".ptc\"), method='trace')" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "#### Loggers" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Lightning supports numerous ways of logging metrics, from saving CSV files to TensorBoard to Weight&Biases and more (see their website for the full list).\n", "\n", "For instance, to save the metrics in a .csv file you can use the `CSVLogger`:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from lightning.pytorch.loggers import CSVLogger\n", "\n", "logger = CSVLogger(save_dir=\"experiments\", # directory where to save file\n", " name='myCV', # name of experiment\n", " version=None # version number (if None it will be automatically assigned)\n", " )\n", "\n", "# assign callback to trainer\n", "trainer = lightning.Trainer(callbacks=[checkpoint])" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Or again, the following snippet can be used to save the metrics in the TensorBoard format (requires `tensorboard` to be installed):" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "from lightning.pytorch.loggers import TensorBoardLogger\n", "\n", "logger = TensorBoardLogger(save_dir=\"experiments\", # directory where to save file\n", " name='myCV', # name of experiment\n", " version=None # version number (if None it will be automatically assigned)\n", " )" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "#### Adding new callbacks: save metrics into a dictionary" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Callbacks can also be easily implemented in order to perform custom tasks. \n", "\n", "For instance, in `mlcolvar.utils.trainer` we implemented a simple `MetricsCallback` object which save the logged metrics into a dictionary. This allows to easily display the results in the tutorials without having to save the metrics with the loggers and load them back afterwards. " ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "GPU available: False, used: False\n", "TPU available: False, using: 0 TPU cores\n", "IPU available: False, using: 0 IPUs\n", "HPU available: False, using: 0 HPUs\n" ] } ], "source": [ "from mlcolvar.utils.trainer import MetricsCallback\n", "\n", "log = MetricsCallback()\n", "\n", "# assign callback to trainer\n", "trainer = lightning.Trainer(callbacks=[checkpoint])\n", "\n", "# After the training is over the metrics can be accessed with the dictionary .metrics" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Disable validation loop" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In order to disable the validation loop you need to:\n", "1. tell the `DictModule` not to split the dataset, with `lengths=[1.0]`\n", "2. pass the two options below to the `lightning.trainer`:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# from mlcolvar.data import DictModule\n", "\n", "#datamodule = DictModule(dataset,lengths=[1.0])\n", "\n", "trainer = lightning.Trainer(limit_val_batches=0, num_sanity_val_steps=0)" ] } ], "metadata": { "kernelspec": { "display_name": "pytorch", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.8" }, "orig_nbformat": 4, "vscode": { "interpreter": { "hash": "1cbeac1d7079eaeba64f3210ccac5ee24400128e300a45ae35eee837885b08b3" } } }, "nbformat": 4, "nbformat_minor": 2 }